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The discovery of particle filtering methods has enabled the use of nonlin- 
. - , ear filtering in a wide array of applications. Unfortunately, the approximation 

error of particle filters typically grows exponentially in the dimension of the 
' _ j I underlying model. This phenomenon has rendered particle filters of limited 

. use in complex data assimilation problems. In this paper, we argue that it is 

' often possible, at least in principle, to develop local particle filtering algo- 

I rithms whose approximation error is dimension-free. The key to such devel- 

^ ^ ■ opments is the decay of correlations property, which is a spatial counterpart 

of the much better understood stability property of nonlinear filters. For the 

00 , simplest possible algorithm of this type, our results provide under suitable 

assumptions an approximation eiTor bound that is uniform both in time and 
' in the model dimension. More broadly, our results provide a framework for 

, the investigation of filtering problems and algorithms in high dimension. 
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1. Introduction and background. A fundamental problem in a broad range 
of applications is the combination of observed data and dynamical models. Partic- 
ularly in highly complex systems with partial observations, the effective extraction 
and utilization of the infomiation contained in observed data can only be accom- 
plished by exploiting the availability of accurate predictive models of the under- 
lying dynamical phenomena of interest. Such problems arise in applications that 
range from classical tracking problems in navigation and robotics to extremely 
large-scale problems such as weather forecasting. In the latter setting, and in other 
complex applications in the geophysical, atmospheric and ocean sciences, incorpo- 
rating observed data into dynamical models is called data assimilation. 

From a statistical perspective, it is in principle simple to formulate the optimal 
solution to the data assimilation problem. We model the dynamics and observations 
jointly as a bivariate Markov chain Yn)n>o taking values in a possibly high- 
dimensional state space X x Y (throughout this paper we will consider discrete 
time models for simplicity; continuous time models may also be considered). The 
process {Xn)n>o describes the underlying dynamics of interest, while the process 
{Yn)n>o denotes the observed data. To estimate the hidden state X„ based on the 
observation history Yi , . . . , y„ to date, we introduce the nonlinear filter 



If the conditional distribution 7r„ can be computed, it yields an optimal (least mean 
square) estimate of Xn as well as a complete representation of the uncertainty in 
this estimate. Moreover, an important property of the filter is that it is recursive: 7r„ 
depends only on 7r„_.i and the new observation y„. This is crucial in practice, as it 
allows the filter to be implemented on-line over a long time horizon. 

In practice, however, the optimal filter is almost never directly computable: it 
requires the propagation of an entire conditional distribution, which generally does 
not admit any efficiently computable sufficient statistics.' The practical implemen- 
tation of nonlinear filtering was therefore long considered to be intractable until the 
discovery of a class of surprisingly efficient sequential Monte Carlo algorithms, 
known as particle filters, for approximating the filter. The simplest such algorithm 
simply inserts a random sampling step in the filtering recursion and approximates 
the filter 7r„ by the resulting empirical measure tt^ (cf. section 1.1 below). It is not 

' Important exceptions are two special cases: linear Gaussian models (which gives rise to the 
celebrated Kalman filter) and models with a (small) finite state space, cf. [5]. However, most complex 
models, such as those that of interest to us here, do not fall into these very limited categories. 
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difficult to sliow tiiat tiiis gives rise to a standard Monte Carlo error 



sup E|7r„(/) 

l/l<i 



Uf)\ < 



c 



where N denotes the number of particles. Moreover, a crucial insight is that the 
constant C typically does not depend on time n due to the stability property of 
nonlinear filters [6, 5], so that the particle filter can indeed function in an on-line 
fashion. Particle filters have proved to perform extraordinarily well in many clas- 
sical applications and are widely used in practice. We refer to [5] for a detailed 
overview of particle filtering algorithms and their analysis. 

Unfortunately, despite their widespread success, particle filters have nonethe- 
less proved to be essentially useless in truly complex data assimilation problems. 
The reason for this, long known to practitioners, has only recently been subjected 
to mathematical analysis in the work of Bickel et al. [3, 14]. Roughly speaking, 
the constant C in the above bound, while independent of time n, must typically 
be exponential in the dimension of the state space of the underlying model. This 
curse of dimensionality does not affect most classical tracking problems, whose 
dimension is typically of order unity, but becomes absolutely prohibitive in large- 
scale data assimilation problems such as weather forecasting where model dimen- 
sions of order 10^ are routinely encountered [1]. While the curse of dimensionality 
problem in particle filters is now fairly well understood, there exists no rigorous 
approach to date for alleviating this problem [2, 17]. Practical data assimilation 
in high-dimensional models is therefore generally performed by means of ad-hoc 
algorithms, frequently based on (questionable) Gaussian approximations, that pos- 
sess limited theoretical justification [9, 11, 1]. The development of ideas that could 
enable the principled use of particle filters in high-dimensional settings remains 
a fundamental open problem in data assimilation and in numerous other complex 
filtering problems (for example, multi-target tracking, tracking the spread of epi- 
demics, traffic flow prediction in freeway networks, etc.) 

At the same time, the mathematical theory of nonlinear filtering in high dimen- 
sion has remained essentially in its infancy. Despite that the study of large-scale 
interacting systems is an important topic in contemporary probability theory (fre- 
quently motivated by problems in statistical mechanics, e.g., [8, 12]), almost noth- 
ing is known about the emergence of high-dimensional phenomena in the setting 
of conditional distributions. It is not even entirely clear how filtering problems in 
high dimension can be fruitfully formulated, and what type of models should be 
investigated in this setting. Moreover, most mathematical tools used in nonlinear 
filtering theory (cf. [5]) are ill-suited to the investigation of the much more delicate 
problems that arise in high dimension. We have recently begun to explore high- 
dimensional probabilistic phenomena in nonlinear filtering [13, 16]. The present 
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paper arose from the realization that such phenomena are not only of interest in 
their own right, but that they can provide mechanisms that enable the development 
and analysis of particle filtering algorithms in high dimension. 

The central idea of this paper is that the decay of correlations property of high- 
dimensional filtering models, which is in essence a spatial counterpart of the much 
better understood stability property of nonlinear filters, can be exploited to develop 
local particle filters that avoid the curse of dimensionality. For the simplest possible 
algorithm of this type, we will prove under suitable assumptions an approximation 
eiTor bound that is uniform both in time and in the model dimension. While it is 
far from clear whether this simple algorithm is of immediate practical utility in the 
most complex real-world applications (a question far beyond the scope of this pa- 
per, cf. section 2.3), our results provide the first rigorous proof of concept that it is 
in fact possible, at least in principle, to develop particle filtering algorithms whose 
approximation error is dimension-free. A broader goal of this paper is to introduce 
a natural foundation for the investigation of filtering problems and algorithms in 
high dimension, as well as some basic mathematical tools for this purpose. 

In the remainder of this section, we provide some essential background on non- 
linear filtering, particle filtering algorithms, and the curse of dimensionality, as well 
as a brief overview of the general ideas and contributions of this paper. 

1.1. Classical filtering models and particle filters. A hidden Markov model is 
a Markov chain y„)„>o whose transition probability P can be factored as 



Thus {Xn)n>Q is itsclf a Markov chain in a Polish state space X with transition 
density p : X x X — )• M+ with respect to a given reference measure ijj, while 
iXn)n>Q are conditionally independent given {Xn)n>o in a Polish state space Y 
with transition density gi : X x Y — R+ with respect to a reference measure Lp. 
This dependence structure is illustrated in Figure 1. We interpret (X„)„>o as an 
underlying dynamical process that is not directly observable, while the observable 
process {Yn)n>Q consists of partial and noisy observations of (X„)„>o- 

In the following, we will assume that the process {X^, Yn)n>o is realized on its 
canonical probability space, and denote for any probability measure ^ on X by P'^ 
the probability measure under which y„)„>o is a hidden Markov model with 
transition probability P as above and with initial condition Xq ~ /x. For x G X, we 
write for simplicity P^' := P*^^. As the process (X„)„>o is unobservable, a centi'al 
problem in this setting is to track the unobsei^ved state X„ given the observation 
history Fi, . . . , y„: that is, we aim to compute the nonlinear filter 




TT' 



:=P^[X„ E ■\Yu...,Yn]. 
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n + 1 



Fig L Dependency graph of a hidden Markov model. 



It is well known, and easily verified using the Bayes fonnula, that the filter vr^ can 
be computed recursively: that is, we have the recursion (see, e.g., [5]) 



where 



< = f^, K = Fn<_i (n > 1), 

/ f{x') g{x', Yn)p{x, x') ip{dx') p{dx) 



(FnP)(/) : 



/ g{x' ,Yn) p{x,x') ip{dx') p{dx) 
It is instructive to write the recursion F„ := C^P in two steps 



„ prediction „ _ „ a _ c ^r^^ 



where 



(Pp)(/) := j f{x')p{x,x')i;{dx')p{dx) 
f f{x) g{x,Yn) p{dx) 



{CnP){f) 



f 9{x,Yn)p{dx) 



In the prediction step, the filter vr^„]^ is propagated forward using the dynamics of 
the underlying unobserved process (X„)„>o to compute the predictive distribution 
7rf^_ := F'^lXn G • |Yi, . . . ,y„_i]. Then, in the coiTcction step, the predictive 
distribution is conditioned on the new observation Yj^ to obtain the filter vr^. 

The recursive structure of the nonlinear filter is of central importance, as it al- 
lows the filter to be computed on-line over a long time horizon. Nonetheless, the 
recursion is still at the level of probability measures, and in general no finite- 
dimensional sufficient statistics exist. Therefore, the practical implementation of 
nonlinear filters typically proceeds by Monte Carlo approximation. The most com- 
mon algorithm of this type simply inserts a sampling step in the filtering recursion: 
TTn is approximated by the empirical distribution vr^ computed by the recursion 
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Algorithm 1: Bootstrap particle filter 

Let vtq = fi; 

for k = 1, . . . ,n do 

Sample i.i.d. Xfc„i(z), i = 1, . . . , N from the distribution Tr'^_i', 

Sample Xk{i) ~ ■) dip, i = 1, . . . , N; 

Compute 'u;fc(z) = g{xk{i),Yk)/ ^^^^ g{xk{£),Yk), i = l,...,iV; 

Let = J2iLi Wk{i) Sxk{iy, 



Fig 2. r/ie classical bootstrap particle filtering algorithm. 



where F„ := C„S P consists of three steps 

„u prediction ^ sampling „u cNr^-ti. correction /- - U 

Here > 1 is the number of particles used in the algorithm, and is the sam- 
pling operator that defines for a probability measure p the random measure 

1 ^ 

S^p := ^ X] '^^(*)' (a^(^))i=i,...,Af are i.i.d. samples ~ p 

1=1 

(if p is a random measure, then (x(i))i=i^...^Ar are drawn conditionally given p). 
This yields the bootstrap particle filtering algorithm described in Figure 2. This al- 
gorithm is exceedingly simple to implement, and it is easily shown that the particle 
filter -kn converges to the exact filter vr^' as — )• oo. We refer to [5] for a detailed 
overview of particle filtering algorithms and their analysis. 

To gain some insight into the approximation properties of the particle filter, let 
us perform the simplest possible error analysis. We define the distance 

IIIp-p'III := supE[|p(/)-p'(/)|2]^/' 

l/l<i 

between two random measures p, p' on X. It is an easy exercise to show that 
|||Pp-Pp'|||<|||p-/>'|||, ll|p-S^p|||< ^ 



N 

Let us assume for simplicity that the observation density g is bounded away from 
zero and infinity, that is, k < g{x, y) < for some < /t < 1. As 

(Qp)(/)-(C„p')(/) = 

-{pi^fgn) - p'ii^fgn)} + ^T^-r^ip'i'^an) - p{ngn)] 



p{gn) p'{gn) p{gnj 
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with gn{x) := g{x, Yn), and as \Kgn\ < 1 and p{gn) > i^, we obtain 



CnP - Cnp'lW < 2k 



2 



p- p 



Putting these bounds together, we find that 




-2 



{ 



1 



n-l 




< 



where the second inequality is obtained by iterating the first inequality n times. We 
therefore find that the bootstrap particle filter does indeed approximate the exact 
nonlinear filter with the typical Monte Carlo 1 / \/iV -rate. 

It should be noted that our crude error bound grows exponentially in time n. If 
the eiTor were in fact to grow exponentially in time, this would make the particle 
filter largely useless in practice as it could not be run reliably for more than a few 
time steps (in particular, it could not be run on-line over a long time horizon). For- 
tunately, however, the exponential growth of the error is an artifact of our crude 
bound and typically does not occur in practice. We have omitted to take into ac- 
count an essential phenomenon: ergodicity of the underlying model will cause the 
filter to be stable, that is, vr^ forgets its initial condition /i as n — oo. The stability 
property provides a dissipation mechanism that mitigates the accumulation of ap- 
proximation eiTors over time. A more sophisticated analysis that exploits this idea 
yields a time-uniform enor bound, see section 3. 1 below. 

1.2. The curse of dimensionality. We have stated above that particle filters suf- 
fer from the curse of dimensionahty. It is, however, far from obvious at this point 
why this should be the case. Indeed, the state spaces X and Y have only been as- 
sumed to be Polish (a mild technical assumption meant only to ensure the existence 
of regular conditional probabilities), and no explicit notion of dimension appears in 
the above error bound. To understand why the above bound is typically exponen- 
tial in the model dimension, we must consider a suitable class of high-dimensional 
models in which the dependence on dimension can be explicitly investigated. In 
section 2, we will introduce a general class of high-dimensional filtering models 
that is prototypical of many data assimilation problems. In the present section, how- 
ever, we consider a much simpler class of trivial high-dimensional models that is 
useless in any application, but is nonetheless helpful for the purpose of developing 
intuition for dimensionality issues in particle filters. 

In a d-dimensional model, Xn, Yn are each described by d coordinates X^, Y^, 
i = 1, . . . , d. To construct a trivial d-dimensional model, we simply start with a 
given one-dimensional model and duplicate it d times. That, is, let {Xn, Yn)n>o be 
a hidden Markov model on X x Y with transition density p and observation density 
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g with respect to reference measures and (p, respectively. Then we set 

X = i'^, Y = Y^ t/> = V'®'^, (/3 = ^®'^, 

and 

d d 
p{x,z) =Y[p{x\z'), g{x,y) =Ylg{x\y'), 
i=l 1=1 

SO that each coordinate {X^,Y^)n>o is an independent copy of Yn)n>o- Note 
that we have used the term (i-dimensional in the sense that our model has d in- 
dependent degrees of freedom: each degree of freedom can itself in principle take 
values in a high- or even infinite-dimensional state space X x Y. This is, however, 
precisely the notion of dimension that is relevant to the curse of dimensionality (in 
[3, 14] this idea is sharpened by a notion of "effective dimension"). 

In this trivial setting, it is now easily seen how the curse of dimensionality arises 
in our error bound. Indeed, let us assume again for simplicity that k < g{x,y) < 
K"^ for some < k < 1. Then k'^ < g{x, y) < k~'^, so we obtain a bound that is 
exponential in the dimension d even after only one time step: 

lllvrf — tttIW < 



N 

An inspection of our bound clarifies the source of this exponential growth: even 
though the Monte Carlo sampling itself is dimension-free {\\\p — p\\\ < N"^/"^ 
independent of dimension), the correction operator C„, which is highly nonlinear, 
blows up the sampling error exponentially in high dimension. In particular, it is ev- 
idently the dimension of the observations, rather than that of the underlying model, 
that controls the exponential growth in our enor bound. 

Of course, the above analysis is far from convincing. First of all, we have only 
proved a rather crude upper bound on the approximation error, so that it might be 
possible that a more sophisticated bound would eliminate the exponential depen- 
dence on dimension as was done using the filter stability property to eliminate the 
exponential dependence on time. Second, one could argue that our strong notion of 
approximation with respect to the ||| ■ ||| -norm is too restrictive to give meaningful re- 
sults in high dimension (which is in fact the case: we will later consider local error 
bounds instead), so that a weaker notion of approximation might avoid the expo- 
nential dependence on dimension. Unfortunately, the much more delicate analysis 
of Bickel et al. [3, 14] demonstrates conclusively that the curse of dimensionality 
of the bootstrap particle filter is a genuine phenomenon and not a mathematical 
deficiency of our analysis, as we will briefly explain presently. Nonetheless, both 
the ideas raised above to eliminate the exponential dependence on dimension will 
play an important role in the remainder of this paper. 
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The key obstacle when the observations are high-dimensional is that the pos- 
terior measure C„p is nearly singular with respect to the prior measure p: in par- 
ticulars a point that has high likelihood under p has likelihood under C„/9 that is 
exponentially small in the dimension. Therefore, if we draw a fixed number of 
samples from p, then with very high probability every one of these samples will 
have exponentially small likelihood under C„p and, as is common in rare-event 
scenarios, the least unlikely sample will be exponentially more likely than any of 
the other samples. Thus C„S^p will put almost all its mass on the sample with the 
largest likelihood, which yields effectively a Monte Carlo approximation of C„/3 
with sample size 1 rather than A^. This phenomenon, which is known as collapse 
or sample degeneracy in the literature, rules out any meaningful form of approx- 
imation in high dimension. In [3, 14], a careful analysis shows that the collapse 
phenomenon occurs unless the sample size is taken to be exponential in the 
dimension, which provides a rigorous statement of the curse of dimensionality. 

Remark 1.1. The problem of sampling from a weighted measure of the form 
{Cp){dx) := g{x)p{dx)/ f g{z)p{dz) appears in numerous applications in statis- 
tics, computer science, and physics. The naive approximation Cp k, CS^ p is well 
known to be useless in large-scale problems: instead, Markov Chain Monte Carlo 
(MCMC) methods are almost universally used for this purpose and can be rigor- 
ously studied in high dimension (e.g., [12]). However, even if we were able to sam- 
ple exactly from the weighted measure Cp, this would still not resolve our problems 
in the filtering context. Indeed, suppose that we could implement the "exact" parti- 
cle filtering recursion F„ = S^C„P rather than the bootstrap filter F„ = C„S^P. 
Then the error between ttj^ = Fi/i and ttj* = Fip would be dimension-free, but 
the error between tt!^ = ^2'^i and irt^ = F27r^ would again exhibit exponential 
dependence on the dimension due to the sampling peifomied in the first time step. 
The curse of dimensionality would therefore still arise essentially as above due 
to the recursive nature of the filtering problem, even disregarding the difficulty of 
sampling from a weighted distribution. In particular, replacing the sampling step in 
the particle filter by an MCMC method does not resolve the fundamental problem 
that we face in high dimension (see [2] for related discussion). 

If, instead of computing the filter P[X„ G • jYi, . . . , Yn], we wish to compute 
the full conditional path distribution P[Xo, . . . , Xn S • |Yi, . . . , Yn] (known as 
the smoothing problem), MCMC methods can be successfully employed in high 
dimension. However, this procedure requires the entire history of observations and 
is not recursive, so that it cannot be implemented on-line and is impractical over 
a long time horizon (cf. [2]). The crucial question to be addressed is therefore 
whether it is possible to develop filtering algorithms that are both recursive and 
that admit error bounds that are uniform in time and in the model dimension. 
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1.3. Contributions of this paper. While the curse of dimensionality in particle 
filters is now fairly well understood, it is far from clear how one could go about 
addressing this problem. Several fundamental questions arise directly: 

1 . What sort of filtering models are natural to investigate in high dimension? 

2. What sort of mechanism might allow to surmount the curse of dimensional- 
ity? How can such a mechanism be exploited algorithmically? 

3. What sort of mathematical tools are needed to address such problems? 

We aim to address each of these questions in the sequel. We will presently provide 
an informal discussion of some basic ideas in this paper; much of the remainder of 
the paper will be devoted to making these ideas precise. 

Some basic insight can be obtained by considering again the trivial model of 
the previous section. Despite that the bootstrap particle filter suffers from the curse 
of dimensionality when applied to the full model, it is obvious in this case that 
one can surmount this problem in a trivial fashion: as each of the coordinates of 
the high-dimensional model is independent, one can simply run an independent 
bootstrap filter in each coordinate. It is evident that the local error of this algorithm 
(that is, the enor of the mai^ginal of the filter in each coordinate) is, by construction, 
independent of the model dimension (that is, the number of coordinates). 

We would like to extend this idea to nontrivial models that are of genuine prac- 
tical interest. Unfortunately, it is far from clear initially how this can be done. The 
local algorithm above was made possible because the coordinates of the trivial 
model are truly independent. When this is not the case, we cannot mn independent 
particle filters in each dimension as all the dimensions are coupled by the dynam- 
ics of the model. We therefore aim to exhibit a more general probabilistic structure 
that can be similarly exploited in a broad class of high-dimensional models. 

In most data assimilation problems, the high-dimensional nature of the model 
is essentially due to its spatial structure: the aim of the problem is to track the 
dynamics of a random field (for example, the atmospheric pressure and temperature 
fields in the case of weather forecasting). We therefore take as a starting point 
the notion that the coordinates Xl^^Y^ {v S V) of our hidden Markov model 
are indexed by a large graph G = {V, E) that represents the spatial degrees of 
freedom of the model. It is of course not reasonable to expect that the dynamics 
at each spatial location is independent, as was assumed in the trivial model of 
the previous section. On the other hand, dynamics of spatial systems is typically 
local in nature: the dynamics at a spatial location depends only on the states at 
locations in a neighborhood. Moreover, the obsei^vations are typically local in the 
sense that (a subset of) spatial locations are observed independently. Such local 
filtering models are prototypical of a broad range of high-dimensional filtering 
problems, and will provide the basic framework for our main result (section 2.1). 
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While the law of the model at each spatial location is no longer independent 
as in the trivial model of the previous section, large-scale interacting systems can 
nonetheless exhibit an approximate version of this property: this is the decay of 
correlations phenomenon that has been particularly well studied in statistical me- 
chanics (see, e.g., [8]). Informally speaking, while the states and X^' at two 
sites v,w G F are probably quite strongly correlated when v and w are close to- 
gether, one might expect that X^ and X^ are nearly independent when v and w 
are far apart as measured with respect to the natural distance in the graph G. 

The core idea of this paper is that the decay of congelations property can provide 
a mechanism to surmount the curse of dimensionaUty. A speculative back-of-the- 
envelope computation explains how this might work. Due to the decay of corre- 
lations, the conditional distribution of the site given the new obsei-vation y„ 
should not depend significantly on observations at sites w distant from v. Sup- 
pose we can develop a local particle filtering algorithm that at each site v only uses 
observations in a local neighborhood iiT of i; to update the filtering distribution. 
As we have seen in the previous section, the sampling eixor is controlled by the 
dimension of the observations: as we have now restricted to observations in K, the 
sampling en^or at each site will be exponential only in card K rather than in the full 
dimension card V. On the other hand, the truncation to observations in K is only 
approximate: the decay of correlations property suggests that the bias introduced 
by this truncation should decay exponentially in diam K. Therefore, 

^card K 

error = bias + variance ~ ^-diamK _|_ 

If the size of the neighborhoods K is chosen so as to optimize the error, then the 
resulting algorithm is evidently consistent (with a slower convergence rate than the 
standard 1 / \/]V Monte Carlo rate: this is likely unavoidable in high dimension) 
with an error bound that is independent of the model dimension card V. 

The main result of this paper is that these speculative ideas can be made precise 
at least for one particularly simple local filtering algorithm: the block particle fil- 
ter (section 2.2). While the above back-of-envelope computation provides a basic 
template for our approach, the rigorous implementation of these ideas requires the 
introduction of mathematical machinery that has not previously been applied in the 
study of nonlinear filtering. Just as in the case of the filter stability property (see 
[16] and the references therein), it is far from clear that any decay of correlations 
properties of the underlying model are inherited by the filter as we have taken for 
granted above: in fact, striking counterexamples show that such inheritance can fail 
in surprising ways [13]. The investigation of such problems constitutes an essential 
part of our proofs. More generally, in our setting, it is necessary to develop a local 
analysis of high-dimensional filtering problems. The main workhorse that we use 
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for this purpose is a powerful (albeit blunt) tool that we borrow from statistical me- 
chanics, the Dobrushin comparison theorem [8, Theorem 8.20], that will be used 
repeatedly in our proofs in various different ways. The decay of correlations prop- 
erty crucially enters this local analysis: not only does it allow us to control the bias, 
but it also allows us to control the spatial accumulation of approximation eiTors 
much as the filter stability property is used to control the decay of approximation 
errors over time. An outline of the main steps and ideas in the proof of our main 
result will be given in section 3; detailed proofs are given in section 4. 

It should be emphasized that our result, while providing a first rigorous analy- 
sis of a local particle filtering algorithm in high dimension, is essentially a proof 
of concept. The general idea to exploit decay of correlations provides a promising 
approach to the curse of dimensionality problem (such a possibility has also been 
occasionally mentioned in the applied literature, e.g., [17, 14]); however, the block 
particle filter that we analyze is the simplest possible algorithm of its type, and 
possesses some inherent limitations that can potentially be addressed by the devel- 
opment of more sophisticated local particle filters. In section 2.3, we will discuss 
some limitations of our results and potential directions for further investigation. 

2. Main result and discussion. 

2. 1. Filtering models in high dimension. This paper is concerned with filtering 
problems in high dimension. In order to investigate such problems systematically, 
we presently introduce a class of high-dimensional filtering models that will pro- 
vide the basic framework to be investigated throughout this paper. In these models, 
the state (Xn, Yn) at each time n is a random field (X^, y^)„gy indexed by a (fi- 
nite) undirected graph G = {V,E). The graph G describes the spatial degrees of 
freedom of the model, and the underlying dynamics and observations are local with 
respect to the graph structure in a sense to be made precise below. The dimension 
of the model should be interpreted as the cardinality of the vertex set V, which is 
typically assumed to be large. Our aim is to develop quantitative results that are, 
under appropriate assumptions, independent of the dimension card V. 

We now define the hidden Markov model Yn)n>o to be considered in the 
sequel (we will adopt thr^oughout the basic setting and notation introduced in sec- 
tion 1.1). The state spaces X and Y of Xn and Yn, and the reference measures tp 
and if of the transition densities p and g, respectively, are of product fomi 

v&v vev vev v&v 

where ■0" and ip^ are reference measures on the Polish spaces X" and Y*", respec- 
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Fig 3. Dependency graph of a high-dimensional filtering model of the type considered in this paper. 



lively. The transition densities p and g are given by 

p{x,z) = llfix,z^), gix,y) = llg"ix\y^), 

where p"^' : Xx X" ^ R+ and g^ : X" x ^ R+ are transition densities with 
respect to the reference measures tp^ and tp^, respectively. 

The spatial graph G is endowed with its natural distance d (that is, d{v,v') is 
the length of the shortest path in G between v, v' € V). Let us fix throughout a 
neighborhood size r G N, and define for each vertex v ^ V the r-neighborhood 

N{v) = {v' €V : d{v,v') < r}. 

We will assume that the dynamics of the underlying process (X„)„>o is local in the 
sense that p'"{x, z") depends on x^^^^ only (we write x'^ = {x^)j^j for J C V): 

p^'ix, z"") = p'^ix, z'') whenever x^^"^ = x^^"\ 

That is, the conditional distribution of X'^ given Xq, . . . , Xn-i depends on X^}^1^ 
only. Similarly, by construction, the observations are local in that the conditional 
distribution of given Xn depends on X^ only. This dependence structure is 
illustrated in Figure 3 (in the simplest case of a linear graph G with r = 1). 

Markov models of the form introduced above appear in the literature under var- 
ious names, such as locally interacting Markov chains or probabilistic cellular au- 
tomata [7, 10]. Such models arise naturally in numerous complex and large-scale 
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applications, including percolation models of disease spread or forest fires, free- 
way traffic flow models, probabilistic models on networks and large-scale queue- 
ing systems, and various biological, ecological and neural models. Moreover, local 
Markov processes of this type arise naturally from finite-difference approximation 
of stochastic partial differential equations, and are therefore in principle applicable 
to a diverse set of data assimilation problems that arise in areas such as weather 
forecasting, oceanography, and geophysics (cf. section 2.3.3). While more general 
models are certainly of substantial interest, the model defined above is prototypical 
of a broad range of high-dimensional data assimilation problems and provides a 
basic setting for the investigation of filtering problems in high dimension. 

2.2. Block particle filter: dimension-free bounds. As was explained in section 
1.2, the bootstrap particle filter is not well suited to high-dimensional models: the 
approximation error generally grows exponentially in the model dimension card V. 
To surmount this problem, we aim to develop local particle filtering algorithms 
that can exploit decay of correlations properties of the underlying filtering model. 
In this paper, we will investigate in detail the simplest possible algorithm of this 
type, the block particle filter, that will be introduced presently. While this algorithm 
possesses some inherent limitations (see below), it is the simplest local algorithm 
both mathematically and computationally, and therefore provides an ideal starting 
point for the investigation of particle filters in high dimension. 

To define the block particle filtering algorithm, we begin by introducing a parti- 
tion % of the vertex set V into nonoverlapping blocks: that is, we have 

V = [j K, KnK' = 0for K ^ K', K, K' G %. 

We now define the blocking operator 

Bp := (g) B^p, 

where for any measure p on X = (g)^,g^ X" and J C y we denote by B'^ p the 
marginal of p on <S>v£j "^^^ random field described by the measure Bp on X is 
independent across different blocks defined by the partition %, while the marginal 
on each block agrees with the original measure p. The block particle filter inserts 
an additional blocking step into the bootstrap particle filter recursion: that is, 

where F^i := C,iBS^P consists of four steps 

prediction/sampling^ _ rN p ~ blocking/correction^ ^ H _ ^ 
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Algorithm 2: Block particle filter 

Let vtq = fi; 

for k = 1, . . . ,n do 

Sample i.i.d. i = 1, . . . , N from the distribution Tr'^_i', 

Sample ~p"(xfc_i(i), ■)di;\i = 1,... ,N,v eV; 

Compute <(o = ^tnl!ri^iS!yn - ' = « 

Lel*!; = ®A'63cE.1i="f««.f(.:); 

end 



Fig 4. The block particle filtering algorithm considered in this paper. 

The resulting algorithm is given in Figure 4. In the special case % = {V}, the block 
particle filter reduces to the bootstrap particle filter, so that the former is a strict 
generalization of the latter (we have therefore not introduced a separate notation 
for the bootstrap particle filter: in the sequel, the notation vr^ always refers to the 
block particle filter). The introduction of independent blocks allows to localize the 
algorithm, however, which will be crucial in the high-dimensional setting. 

It is immediately evident from inspection of the block particle filtering algorithm 
that only obsei^vations in block K are used by the algorithm to update the filtering 
distribution in block K. Therefore, following the heuristic ideas of section 1.3, we 
expect that the samphng error of the algorithm is exponential in card K rather than 
in the model dimension card V. To control the bias introduced by the blocking step, 
note that the blocking operator B/3 decouples the distribution p at the boundaries 
of the blocks. The decay of corTelations property (if it can be established) should 
cause the influence of such a perturbation on the marginal distribution at a vertex 
V ^ K to decay exponentially in the distance from v to the boundary of the block 
K. Thus the back-of-the-envelope computation in section 1.3 applies to the local 
error at "most" vertices, as the boundaries of the blocks only constitute a small 
fraction of the total number of vertices. On the other hand, the erxor will necessarily 
be larger for vertices closer to the block boundaries. This spatial inhomogeneity of 
the local error is an inherent limitation of the block particle filter that one might 
hope to alleviate by the development of more sophisticated local particle filters. We 
postpone further discussion of this point to section 2.3.2. 

Having introduced the block particle filtering algorithm, we now proceed to for- 
mulate the main result of this paper (Theorem 2. 1 below). 

Recall that we have introduced the neighborhoods 



N{v) := {v' : d{v,v') < r} 
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above, where the neighborhood size r is fixed throughout this paper (in our model, 
the state of vertex v depends only on the states of vertices in N{v) in the previous 
time step). Given a set J C V, we denote the r-inner boundary of J as 

dJ :={ve J : Niv) ^ J} 

(that is, dJ is the subset of vertices in J that can interact with vertices outside J in 
one step of the dynamics). We also define the following quantities: 

iDCloo := max card i^, 

A := maxcardjv' S V : d{v,v') < r}, 
Ax := maxcardlK' G % : d(K, K') < r|, 

where we define as usual d{J,J') := min^^zj mm^/^j' d{v,v') for J, J' C V. 
Thus |3C|oo is the maximal size of a block in X, while A (Age) is the maximal 
number of vertices (blocks) that interact with a single vertex (block) in one step of 
the dynamics. It should be emphasized that r, A and Ax are local quantities that 
depend on the geometry but not on the size of the spatial graph G. 
Finally, we introduce for J C y the local distance 



p'\\\j:= sup B[\p{f)-p'{f)\ 
/ex-^:|/l<i 



211/2 



between random measures p, p' on X, where X"' denotes the class of measurable 
functions / : X — )• M such that /(x) = /(x) whenever x'^ = x'K 

Theorem 2.1. There exists a constant < eq < 1, depending only on the 
local quantities A and Ax, such that the following holds. 
Suppose there exist eq < e < 1 and < k < 1 such that 

e<f{x,z'')<e^^, K< g^ix'^.y") < Vt; G y, x, z G X, y G Y. 

Then for every n > 0, x G X, G 3<C and J K we have 



II vr^ — vrj^ III J < Q card J 



-hd{J,aK) 



where the constants < a, /3i, /32 < oo depend only on e, k, r, A, and Ax- 

The key point of this result is that both the assumptions and the resulting error 
bound depend only on local quantities. In particular, the assumptions and error 
bound depend neither on time n nor on the model dimension card V. 



LOCAL PARTICLE FILTERS 



17 



Remark 2.2. A threshold requirement of the form e > eq is essential in or- 
der to obtain the decay of correlations property: the decay of correlations can fail if 
e > is too small (a phenomenon known as phase transition in statistical mechan- 
ics). Otherwise, the assumptions of Theorem 2.1 are comparable to assumptions 
commonly imposed in the literature to obtain error bounds for the bootstrap paiti- 
cle filter [5, 6] and possess similar limitations. We postpone a discussion of these 
issues to section 2.3.1 below. Let us also note that explicit expressions for the con- 
stants in Theorem 2.1 can be read off from the proofs; however, we do not believe 
that our methods are sufficiently sharp to yield practical quantitative results. 

Remark 2.3. The particle filter -kn depends both on the random samples that 
are drawn in the algorithm and on the random sequence of the observations. How- 
ever, the randomness of the observations plays no role in our proofs. One can there- 
fore interpret the expectation in the definition of |||-||| j as being taken only with re- 
spect to the random sampling mechanism in the block particle filter, and the bound 
of Theorem 2. 1 as holding uniformly with respect to the obsei^vation sequence. 

Remark 2.4. In Theorem 2.1 we have considered vr^ and tt^ with a non- 
random initial condition x G X. This is a choice of convenience: the proof of 
Theorem 2.1 yields the same conclusion for more general initial conditions that 
satisfy a suitable decay of congelations property. On the other hand, the stability 
property of the filter (e.g.. Corollary 4.7 below) ensures that ttu forgets its initial 
condition fi exponentially fast uniformly in the dimension, so there is little loss of 
generality in choosing a computationally convenient initial condition. 

To provide a concrete illustration of Theorem 2.1, we consider in the remainder 
of this section the example where the spatial graph G is a square lattice, that is, 

V = {-d,...,d}'i {d,qen) 

endowed with its natural edge structure. Note that in this case, the graph distance 
d{v, v') is simply the ^i-distance between the corresponding vectors of integers. 
To define the partition %, we cover V by blocks of radius 6 G N: that is, 

% = {{x + {-b, b}i) nV :xe{2b+ 

We assume for simplicity in the sequel that b> r, and that {2d +1)/ (26 + 1) G N 
is integer so that all K ^ % are translates of {—b, . . . , b}'^ (this slightly simplifies 
our arguments below but is not essential to our results). We can easily compute 

\%\oo = {2b + iy, A<{2r + iy, A3c<3'?. 
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Note that these local quantities do not depend on the size d of our lattice. In a data 
assimilation application one might have, for example, g = 2, r = 1, d ~ 10^. 
Consider the block K = {—b, . . . , 6}'^. Note that for n = 0, . . . , 6 - r 

{v£K : d{v, dK) >u} = {-{b - r - u), . . . ,b - r - n}'. 

Fix < 5 < 1 and choose u = [6{2b + 1) /2q - r\ . Then 

card{u G K : d{v,dK) > u} _ [ 2{b - r - + 



caiAK \ 26 + 1 

where we have used 1 — {1 — 5)^/'' > 6 / q. The same conclusion evidently holds 
for every block K Thus Theorem 2.1 gives the following corollary. 

Corollary 2.5. In the square lattice setting V = {—d, . . . , d}'^, there exists 
a constant < eo < 1, depending only on r and q, such that the following holds. 
Suppose there exist eq < e < 1 and < k < 1 such that 

e<p'"{x,z'")<e~^, K<g"{x'',y")<K~^ ^ v e V, x, z £ X, y £ Y. 

Then for every x G X, n > 0, and < 6 < 1 we have 

( M2b+l)'' 1 

card lv£V : ||K - 7r^|||„ < a'e-'^i'^^^f.+i) ^ ^/ \>{l-6) card V, 



where the constants < a' , P[, P'2 < 00 depend only on e, k, r, and q. 

In particular, if we choose the block size 6 = L|(4/^2)~^^'' log^^'^ N — ^J, then 

cardjw E V : |||< - 7r^|||„ < cie-'^^^^^s'^' ^| > (1 - 5) card F 

and 

where the constants < ci , C2 , C3 < 00 depend only on e, n, r, and q. 

Corollary 2.5 makes precise the notion that a properly tuned block particle filter 
can avoid the curse of dimensionality: choosing the block size 6 ~ we 
obtain a local error that can be made arbitrarily small, uniformly both in time n and 
in the lattice size d, by choosing a sufficiently lai^ge sample size A^. More precisely, 
we see that the local en^or at most locations (i.e., on an arbitrarily large fraction of 
the graph) is of order e~'^^°s^'''' ^ , which is polynomial for g = 1 and subpolyno- 
mial otherwise. The bound for the average local error is similarly uniform in n and 
d, albeit with a very slow convergence rate. It appears that these results are chiefly 
limited by the spatial inhomogeneity that is inherent in the block particle filtering 
algorithm, as will be discussed in section 2.3.2 below. 
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Remark 2.6. We have stated the local error in Corollary 2.5 in terms of one- 
dimensional marginals |||7r^ — 7r^|||^ for simplicity; an analogous result can be ob- 
tained for marginals over cubes of any fixed size |||7r^ — vr^lll^^j.^ ^j,. 

Remark 2.7. Theorem 2.1 and Corollary 2.5 should be viewed as a theoret- 
ical proof of concept that it is possible, in principle, to design particle filters that 
avoid the curse of dimensionality. In practice, the slow rate b ~ log^^'' suggests 
that the block size must typically be quite small (of order unity) for realistic val- 
ues of the sample size N, which yields a large bias term in our bounds. We have 
nonetheless observed in simple simulations that the algorithm can work quite well 
even with the choice 6 = 0, so that the practical utility of the algorithm may not 
be fully captured by our mathematical results. Moreover, specific features of cer- 
tain data assimilation applications, such as spaisity of observations, could make it 
possible to choose substantially larger blocks. A systematic investigation of the em- 
pirical peifomiance of local particle filtering algorithms in applications is beyond 
the scope of this paper, however The practical implementation of local particle fil- 
ters for data assimilation will likely require further advances in all mathematical, 
methodological and applied aspects of high-dimensional filtering. 

2.3. Discussion. 

2.3.1. Mixing assumptions and the ergodicity threshold. The basic assumption 
of Theorem 2. 1 is that the local transition densities are bounded above and below: 

This is a local counterpart of the mixing assumptions that are routinely employed in 
the analysis of particle filters [5, 6]. The global mixing assumption e < p{x, z) < 
would imply that the underlying Markov chain is strongly ergodic (in the sense 
that its transition kernel is a strict contraction with respect to the total variation 
distance) and is often used to establish the stabihty property of the filter; this is 
essential to obtain a time-uniform bound on the particle filter error, see section 3.1 
below. The local mixing assumption e < p'"{x,z^) < employed here should 
similarly be viewed as a local ergodicity assumption on the model. 

It is well known that strong mixing assumptions of this type impose some con- 
straints on the underlying model. In particular, strong mixing assumptions often 
require a compact state space: in a noncompact state space the likelihood ratio 
p{x, z)/p{x', z) is typically unbounded as jz] — )• oo (this is readily verified in lin- 
ear Gaussian models, for example), while e < p{x, z) < would imply that 
p{x, z) /p{x' , z) is uniformly bounded. Similarly, the assumptions of Theorem 2.1 
will typically only hold in models whose local state spaces X*' and are compact. 
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While qualitative results in this area have been obtained in much more general set- 
tings (of. [16] and the references therein), it has proved to be more difficult to obtain 
quantitative results under assumptions weaker than strong mixing conditions: it re- 
mains an open problem, for example, to obtain quantitative time-uniform bounds 
under mild ergodicity assumptions even for the approximation error of the boot- 
strap particle filter. These technical issues are however unrelated to the problems 
that arise in high dimension, and we do not address them here. 

On the other hand, there is a crucial assumption in Theorem 2. 1 that does not 
arise in finite dimension. In classical results on particle filters, it is assumed that 
e < p{x, z) < with e > 0. For the local assumption e < p*'(x, z'") < e~^, 
however, it is not sufficient to assume that e > 0; we must assume that e > Eq for 
some strictly positive thi^eshold eo > 0. Some assumption of this fonii is absolutely 
essential in the high-dimensional setting. Unlike the global mixing assumption, the 
local mixing assumption is not in itself sufficient to ensure that the underlying 
model will remain ergodic as the dimension cardy — oo: the cumulative effect 
of the interactions can create long-range correlations that break both ergodicity 
and any decay of congelations properties. Typically, the model is ergodic when the 
mixing constant e is sufficiently large, but ergodicity breaks abruptly as e drops 
below a threshold value eq. Such phenomena, called phase transitions in statistical 
mechanics, are very common in large-scale interacting systems: see [10, 7] for a 
number of examples. When the underlying model fails to exhibit ergodicity and 
decay of coiTelations, we lack the mechanism that we aim to exploit by developing 
local particle filters. Therefore, some assumption of the form e > eq is essential in 
Theorem 2.1 in order to ensure the presence of decay of correlations. 

Unfortunately, the actual constant eq that arises in the proof of Theorem 2. 1 is 
almost certainly fai^ from optimal. The Dobrushin machinery [8, Chapter 8] that 
forms the basis of our proof already does not yield sharp estimates of the phase 
transition point even in the simplest classical models of statistical mechanics. It is 
also far from clear whether the block particle filter should necessarily possess the 
same phase transition point as the underlying model: it may be that the algorithm 
only works in a strict subset of the regime in which the underlying model possesses 
the decay of correlations property. The mathematical tools used in this paper are 
not sufficiently powerful to address much more delicate questions of this type. The 
practical relevance of Theorem 2. 1 is therefore of a qualitative nature — we show 
that the block particle filter can beat the curse of dimensionality above a certain 
phase transition point — but should not be relied upon to provide quantitative guid- 
ance in specific situations. It remains of substantial interest to weaken the assump- 
tions of Theorem 2.1 and to obtain sharper quantitative results; further progress 
in this direction will require the development of a more sophisticated probabilistic 
toolbox for the investigation of filtering problems in high dimension. 
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One drawback of the assumptions of Theorem 2. 1 is that the decay of coiTcla- 
tions in space and time are treated on the same footing: as e — 1, both the spatial 
and temporal correlations disappear. To ensure ergodicity and decay of correla- 
tions, one would expect that it suffices to assume only that the spatial interactions 
are sufficiently weak. It is therefore of interest to separate the temporal and spatial 
ergodicity assumptions, for example, by replacing the assumption e < p'"{x, z'") < 
by an assumption of the form ep^{x'", z"") < p^{x, z"") < s~^p^{x'", z"") that 
only controls the spatial interactions. Whether our results can still be proved in this 
more general setting is an interesting topic for further investigation. 

It should be noted that the problems investigated in this paper are closely re- 
lated to fundamental properties of conditional distributions. We have implicitly 
taken for granted that the filter will be stable when the underlying model is ergodic 
(and similarly for the decay of coiTclations property), but it is fai" from obvious 
that such properties are in fact preserved under conditioning on the observations. 
While the inheritance of ergodic properties under conditioning can be proved in 
a very general setting for models with finite-dimensional observations (see [16] 
and the references therein), there exist surprising examples in infinite dimension 
where the filter is non-ergodic even though the underlying model is ergodic and 
nondegenerate [13]. Such probabilistic phenomena remain poorly understood. The 
threshold assumption e > eo rules out such issues in the setting of this paper. 

2.3.2. Local algorithms and spatial homogeneity. The major drawback of the 
block particle filtering algorithm is the spatial inhomogeneity of the bias. As was 
explained in section 2.2, the block particle filter introduces errors at the block 
boundaries. We will increase the size of the blocks as the number of particles N 
increases, so that more points are distant from the block boundaries and therefore 
benefit from the decay of correlations. Nonetheless, points near the boundary will 
always be subject to larger errors, and we can only hope to implement the intuition 
of section 1.3 to spatial locations that are strictly in the interior of the blocks. 

The consequences of this inhomogeneity are manifested quantitatively in Corol- 
lary 2.5. Near the block boundaries. Theorem 2.1 gives a bound of order unity. By 
excluding a small fraction of spatial locations, however, we eliminate the block 
boundaries to retain an error of order e~^^°§^'^'' ^ at "most" spatial locations: 



If, on the other hand, we compute the spatial average of the error, we obtain an 
exceedingly slow convergence rate that is much worse than the "typical" rate: 
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Note that the block boundaries constitute a fraction ~ 1/6 of spatial locations, 
where b is the block size; therefore, as 6 ~ log^/'^ N in Corollary 2.5, we see that 
the error at the block boundaries dominates our bound on the average error. 

The behavior of the errors described above seems to be an inherent limitation 
of the block particle filtering algorithm. It is therefore of significant interest to 
explore the possibility that one could develop alternative local particle filtering 
algorithms that are spatially homogeneous. Conceptually, as explained in section 
1.3, such an algorithm should update the filtering distribution at each site v using 
sites in a centered neighborhood Nb{v) := {v' ^ V : d{v,v') < b}; the decay of 
correlations should then yield a bias that decays exponentially in b. In this case, we 
would expect to obtain a spatially uniform error bound of the form 

vev 

for the optimized neighborhood size b ~ log^/" N. Whether it is in fact possible to 
design a local particle filtering algorithm that attains such a uniform eiTor bound is 
perhaps the most immediate open question that arises from our results. 

It is, of course, not at all obvious how one might go about developing a spatially 
homogeneous algorithm. We will presently discuss one possible idea that could 
be of interest in this setting. It should be emphasized the the following discussion 
is intended to be heuristic, as we do not know how to analyze algorithms of the 
type that we will discuss. However, our aim is to illustrate that the general idea of 
local particle filters could be much broader than is suggested by the block particle 
filtering algorithm — and that the mathematical analysis developed in this paper 
could in itself provide inspiration for further methodological developments. 

At the heart of our results lies the decay of correlations. In our proofs, we will 
use an intuitive notion of decay of correlations of essentially the following form: a 
probability measure ponX possesses the decay of correlations property if the effect 
on the conditional distribution piX"" £ ■ \ X^\^^'^ = x^'^^^'^) of a perturbation to 
X*' decays exponentially in the distance d{v, v') (cf. sections 3.2 and 4.2). The 
blocking operation evidently replaces these conditional distributions by 

for every K £ % and v £ K. Therefore, if p possesses the decay of correlations 
property, then the bias at site v £ K incurred by the blocking operation decays 
exponentially in the distance between v and the boundary of K. From this perspec- 
tive, an approach to spatially homogeneous algorithms readily suggests itself: we 
should aim to replace B with another operator M that satisfies 
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for every v & V. The bias incurred by this operation decays exponentially in b 
uniformly for all v (it is therefore spatially homogeneous). On the other hand, as 

/ g\x^,Y:) n^67v,(.) P'C^, x^) p{dz) r (dx^ 

the sampling error incurred if we replace p by S^p in this expression should only 
be exponential in cardA''5(7;) (which is ~ W for the square lattice) rather than 
in the model dimension card V. This suggests that the local particle filter defined 
by the recursion F„ = S^C„MP should yield a spatially homogeneous algorithm 
in accordance with our intuition. To implement this algorithm, one needs to sam- 
ple from the measure C„MP/9, which we have defined only implicitly in terms of 
its conditional distributions. This is however precisely the task to which MCMC 
methods such as the Gibbs sampler are well suited. One would therefore ostensibly 
obtain a spatially homogeneous local particle filtering algorithm that is recursive in 
time and that uses MCMC to sample the spatial degrees of freedom (regularization 
using M is still key to avoiding the curse of dimensionality, cf. Remark 1.1). 

Conceptually, the idea introduced here is quite natural. The general idea of lo- 
cal particle filters is that one should introduce a spatial regularization step into 
the filtering recursion that enables local sampling. In the block particle filter, this 
regularization is provided by the blocking operation B that projects a probability 
measure on the class of measures that are independent across blocks. In the above 
algorithm, we aim to regularize instead by the operation M that projects a probabil- 
ity measure on the class of Markov random fields of order b. The fatal flaw in our 
reasoning is that the operator M that we have defined implicitly above does not ex- 
ist: the truncated conditional distributions p{X'' G • \X^»^''^\^''^ = x^f^^)^^^^) are 
typically not consistent, so there exists no single probability measure that satisfies 
our definition of M p. Nonetheless, the basic idea introduced here could be fruitful 
if one can develop a practical approach to approximating random fields by Markov 
random fields (for example, one could attempt to substitute the above expression 
for (C„MPp)(X^ G 

in a Gibbs sampler regardless of its inconsistency; 
we do not know how to analyze the properties of such an algorithm, however). 
The development of such ideas evidently presents some interesting mathematical 
as well as methodological challenges that should be investigated further. 

Let us finally observe that, by their nature, local particle filtering algorithms are 
well suited to distributed computation: as the particles are updated locally in the 
spatial graph, this opens the possibility of implementing each local neighborhood 
on a separate processor. While this was not the original intention of the algorithms 
we propose, such properties could prove to be advantageous in their own right for 
the practical implementation of filtering algorithms in very large-scale systems. 
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2.3.3. High-dimensional models in data assimilation. The basic model that we 
have introduced in section 2.1 is prototypical of many data assimilation problems 
and provides a particularly convenient mathematical setting for the investigation of 
filtering problems in high dimension. While such models could be directly relevant 
to many high-dimensional applications, there remains a substantial gap between 
relatively simple models of this form and realistic models used in the most com- 
plex applications, particularly in the geophysical, atmospheric and ocean sciences, 
that frequently consist of coupled systems of partial differential equations. The in- 
vestigation of such complex problems, and the associated numerical, physical, and 
practical issues, is far beyond the scope of this paper. We therefore restrict our 
discussion of such problems to a few brief comments. 

In principle, discrete models as defined in section 2.1 arise naturally as finite- 
difference approximations of stochastic partial differential equations with space- 
time white noise forcing. As the resulting state spaces are not compact, such 
systems cannot satisfy strong mixing assumptions (cf. section 2.3.1), but this is 
likely a mathematical rather than a practical problem. More importantly, it is not 
clear whether the discretized models will be in the regime of decay of conelations 
(that is, above the phase transition point) even if the original continuum model pos- 
sesses such properties. It is possible that this requirement would place constraints 
on the spatial and temporal discretization steps, in the spirit of the von Neumann 
stability criterion in numerical analysis. The physics of such problems could also 
impose constraints on the design of local particle filters; for example, it is sug- 
gested in [17, p. 4107] that discontinuities (such as might be introduced at the 
block boundaries in the block particle filtering algorithm) could generate spurious 
gravity waves in ocean models. Such numerical and practical issues are distinct 
from the fundamental problems in high dimension that we aim to address in this 
paper, but can ultimately play an equally important role in complex applications. 

Let us also note that models considered in the data assimilation literature are of- 
ten deterministic partial differential equations without stochastic forcing; the only 
randomness in such models comes from the initial condition (cf. [9, 1]). In de- 
terministic chaotic dynamical systems, it is impossible to obtain time-uniform ap- 
proximations using classical particle filters as there is no dissipation mechanism 
for approximation errors (the filter cannot be stable in this case, cf. section 3.1). 
This issue is not directly related to dimensionality issues in particle filters: such 
problems arise in every deterministic filtering problem. It is natural to regularize 
deterministic systems by adding dynamical noise to the model (there is an exten- 
sive literature on random perturbations of chaotic dynamics, see for example [4]); 
a similar observation has been made by practitioners in the context of ad-hoc fil- 
tering algorithms, cf. [9, section 5]. To our knowledge, a rigorous analysis of such 
ideas in the setting of particle filters has yet to be performed. 
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3. Outline of the proof. 

3.1. Error decomposition. The goal of Theorem 2.1 is to bound the error be- 
tween the filter vr^ and the block particle filter vr^. Recall that both the filter (section 
1.1) and block particle filter (section 2.2) are defined recursively: 

vr^ = Fn • • • Fi/i, vr^ = Fn • • • Fi/i, 

where F„ := C„P and F„ := C„BS^P. We introduce also the block filter 

Ti"^ = Fn • • • Fi^ 

with F,j := C„BP. By the triangle inequality, we have 

IIFn ^nlllj ^ IIFn ^nllljT^IIFn ''nlllj- 

The first term on the right-hand side quantifies the bias introduced by the projec- 
tion on independent blocks, while the second term quantifies the error due to the 
variance of the random sampling in the algorithm. Each term will be bounded sep- 
ai^ately to obtain the two terms in the error bound of Theorem 2.1. 

The challenges encountered in bounding the bias term (cf. section 3.3) and 
the variance term (cf. section 3.4) are quite different in nature. Nonetheless, both 
bounds are based on a basic scheme of proof that was invented in order to prove 
time-uniform bounds for the bootstrap particle filter [6, 5]. We therefore begin by 
reviewing this general idea, which is based on a simple error decomposition. 

Suppose for sake of illustration that we aim to bound directly the error between 
TT^J and 7r(^. The basic idea is to write tt^ — tt^ as a telescoping sum: 

n 

K-K = '^{^n ■ ■ ■ Fs+iFsF^-i • • • Fi^ - F„ • • • Fs+iF^Fs-i • • • Fi/i}. 

s=l 

By the triangle inequahty, 

n 

\\\< - < 5Z ll|F„ • • • F,+iF,7r,^i - F„ • • • F,+iF,7r,^J|. 

s=l 

The term s in this sum could be interpreted as the contribution to the total error at 
time n due to the filter approximation made in time step s. 

The key insight is now that one can employ the filter stability property to control 
this sum uniformly in time. In its simplest form, this property can be proved in the 
following form: if e < p{x, z) < for all x, z G X, then [6, 5] 

|||F„ • • • fs+iP - F„ • • • fs+ip'\\\ < e-\l - e^r-^p - p'\\\. 
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Thus the filter forgets its initial condition at an exponential rate. However, this also 
means that past approximation errors are forgotten at an exponential rate: if we 
substitute the stability property in the above enor decomposition, we obtain 

n 

, n,p 
s=l ^ 

Thus, if we can control the eiTor ||| F„/9 — Fnp||| in a single time step, we obtain a 
time-uniform bound of the same order. In the case of the bootstrap particle filter, 
if K < g{x,y) < K^^, we proved that |||F„/9 — F„p||| < 2k~'^/^/N in section 1.1, 
and we obtain a time-uniform version of the crude eiTor bound given there. 

The basic error decomposition discussed above allows us to separate the prob- 
lem of obtaining time-uniform bounds into two parts: the one-step approximation 
error and the stability property. It is important to note, however, that both parts 
become problematic in high dimension. We have already seen (section 1.2) that 
the one-step approximation error of the bootstrap particle filter is exponential in 
the model dimension; we will surmount this problem by working with the block 
particle filtering algorithm and performing a local analysis of the one-step error 
using the decay of correlations property (which must itself be established). On 
the other hand, the filter stability bound used above also becomes exponentially 
worse in high dimension: a local bound of the form e < p^{x, z"") < only 
yields e'^'^'^'^^ < p[x,z) < ^-carcH/^ which is exponential in the model dimen- 
sion card V. To surmount this problem, we must develop a much more precise 
understanding of the filter stability property in high dimension, which proves to 
be closely related to the decay of correlations property. The development of these 
ingredients constitutes the bulk of the proof of Theorem 2. 1 . 

3.2. Dobrushin comparison method. How can one control the approximation 
error of high-dimensional distributions? The basic idea that we aim to exploit, both 
algorithmically and mathematically, is that the decay of correlations property leads 
to a form of localization: the effect on the distribution in some spatial set J of a per- 
turbation made in another set J' decays rapidly in the distance d{J, J'). Therefore, 
as long as we measure the error locally (in ||| • ||| j rather than ||| • ||| ), one would hope to 
control the spatial accumulation of approximation errors much as we controlled the 
accumulation of approximation errors in time using the filter stability property. We 
will presently introduce a powerful (albeit blunt) tool — the Dobrushin comparison 
theorem — that makes this idea precise in a very general setting. This fundamental 
result, which plays an important role in statistical mechanics [8, Chapter 8], is the 
main workhorse that will be used repeatedly in our proofs. 

Let / be a finite set, and let S = fliei ^* where S* is a Polish space for each 
i G /. Define the coordinate projections X* : x i— )■ for x G S an i G /. For any 
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probability p on S, we fix a version p* of the regular conditional probability 

pUA) = p{X' G A\X^\^^ = x^\^^). 
We also define for J <^ I the local total variation distance 

\\p-p'\\j:= sup \p{f) - p'{f)\, 

/es-^:|/|<i 

where S"^ is the class of measurable functions / : S — )• M such that f{x) = f{z) 
whenever = . For J = write ||p — /o'H for simplicity. 

We can now state the Dobrushin comparison theorem [8, Theorem 8.20].^ 

Theorem 3.1 (Dobrushin). Let p, p be probability measures on S. Define 
= J sup 11/5^ -pill, fej = sup 11/4 - ^^||. 

Suppose that the Dobrushin condition holds: 

max > Cij < 1. 
is/ ^-^ 

Then the matrix sum D := ^„>o C*" convergent, and we have for every J I 

\\p-p\\j <^^Dijbj. 

This result could be informally interpreted as follows, dj measures the degree 
to which a perturbation of site j directly affects site i under the distribution p. How- 
ever, perturbing site j might also indirectly affect i: it could affect another site k 
which in turn affects i, etc. The aggregate effect of a perturbation of site j on site i is 
captured by the quantity Dij . In this setting, a useful manifestation of the decay of 
correlations property is that Dij decays exponentially in the distance d{i,j). If this 
is in fact the case, then Theorem 3.1 yields, for example, ||p — p|| t < X^j e~^^'^'^^bj, 
where bj measures the local en^or at site j between p and p (in terms of the condi- 
tional distributions p! and p! ). The decay of correlations property therefore controls 
the accumulation of local en^ors much as one might expect. 

Let us now explain how Theorem 3.1 will be applied in the filtering setting. For 
sake of illustration, consider the problem of obtaining a local filter stability bound: 
that is, we would like to bound ||7r^ — 7r^|| j for x, x € X and J C y. It would seem 
natural to apply Theorem 3.1 directly with / = F, S = X, and p = vr^, p = vr^. 



Note that our definition of || ■ || j differs by a factor 2 from tliat in [8]. 



28 



PATRICK REBESCHINI AND RAMON VAN HANDEL 



This is not useful, however, as we do not know how to control the corresponding 
local quantities such as = G • jYi, . . . , y„, X^^^"^ = z^\^% 

Instead, define / = {0, . . . , n} x F and S = X"+\ and let 

p = P"[Xo,...,X„ e ■\Y^,...,Yn], 

p = P^[Xo,...,Xne •|Fi,...,y„]. 

As 

IKn -KWj = \\P- P\\{n}xJ, 

we can now apply Theorem 3. 1 to the smoothing distributions p, p. Unlike the fil- 
ters vr^, vrj^, however, p and p' are Markov random fields on / (cf. Figure 3), so that 
the conditional distributions /j^'" and p^'" can be easily computed and controlled 
in terms of the local densities p^{x, z'") and g" {x" ^y""). For example, as 

/n 
1a{x, XI, . . . , x„) n n P"i^k-i,xl) g^ixl, Y^) ridxl), 

k=lveV 

and as p''(xfc„i, x^) depends only on for d{w, v) < r, we obtain 

p','^{B)^ flB{zl)p^{z,.uzl)g'{zl,Y,n n P'"{z,,z^^,)r{dzl) 

weN{v) 

for < /c < n and v G V (the proportionality is up to a normalization factor). 
We will repeatedly exploit expressions of this type to obtain explicit bounds on 
the quantities Cij and bj that appear in Theorem 3.1. It should be emphasized that 
Pz'^ is a genuinely local quantity: the product inside the integral contains at most 
card N{v) < A terms. We will consequently be able to use Theorem 3.1 to obtain 
bounds that do not depend on the model dimension card V. 

Remark 3.2. In the language of statistical mechanics, we exploit the fact that 
the smoothing distribution [Xq , . . . , X„ G • | Yi , . . . , K„] is a Gibbs measure [8] 
on the space-time index set / = {0, . . . , n} x V. Similar insight has proved to be 
fruitful in the ergodic theory of large-scale interacting Markov chains, cf. [10]. 

3.3. Bounding the bias: decay of correlations. To bound the bias \\tt^ — vf^|| j, 
we follow the basic error decomposition scheme described above: that is, 

n 

Ikn - ^n\\j < l|Fn • • • ^ s+l^ sT^^'s-l " Fn • • • Fs+lFsVff„ J| j. 

s=l 



LOCAL PARTICLE FILTERS 



29 



To implement our program, we must now obtain suitable local bounds on the sta- 
bility of the filter and on the one-step approximation error. Both these problems 
will be approached by application of the Dobmshin comparison theorem. 

In its most basic form, one can prove a filter stability property of the following 
type: provided e > Eq, there exists /3 > (depending only on A and r) such that 

||F„-- - F^+i// - Fn---^s+iy\\j < 4 card J e"^^''"'*) 

for any probability measures fi,u onX and J C F, n > (cf. Corollary 4.7). This 
bound is evidently dimension-free, unlike the crude filter stability bound described 
in section 3.1. Nonetheless, this filter stability bound would yield a trivial result 
when substituted in the error decomposition, as it does not provide any control in 
terms of the distance between and u (and therefore in terms of the one-step error). 
Instead, we will prove in section 4.2 the local stability bound 

||F„ • • • Fs+l^^ - Fn • • • F,+iz.|| J < 2e-^("-^) V maxe-^'^^^^'^^'^D^K/^, 

where Dy' {fi, v) is a suitable measure of the local error between n and v at site v' 
that arises naturally from the Dobmshin comparison theorem (see Proposition 4.4 
for precise expressions). This filter stability bound is genuinely local: the stability 
on the spatial set J C y depends predominantly on the local distance of the initial 
conditions near J (that is, the spatial accumulation of errors is mitigated). This 
localization comes at a price, however; the local filter stability bound holds only if 
the initial condition n satisfies a priori a decay of correlations property. 

Once the local filter stability bound is substituted in the eiTor decomposition, it 
remains to prove a bound on the one-step error Dy{F sTt^_i, Fs7f^_^) with respect 
to the local distance prescribed by the filter stability bound. This will be done in 
section 4.3: we will show that for a constant C that depends only on A, r, e, 

A.(F.A^,F,^) < Ce-/^'^(^'^^) 

for every K ^ % and v G K, provided again that /i satisfies a priori a decay 
of congelations property. This is precisely what we expect: as B only introduces 
errors at the block boundaries, the decay of correlations should ensure that the 
error at site v decays exponentially in the distance to the nearest block boundary. 
The Dobrushin comparison theorem allows to make this intuition precise. 

The decay of conelations property evidently plays a dual role in our setting: it 
controls the approximation error of the block filter, which is the basic principle be- 
hind the block particle filtering algorithm; at the same time, it mitigates the spatial 
accumulation of approximation errors, which is essential for proving dimension- 
free bounds. In order to apply the above bounds, the key step that remains is to 
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prove that the appropriate decay of correlations property does in fact hold, uni- 
formly in time, for the block filter vf^. The latter will be shown in section 4.4 by 
iterating a one-step decay of correlations bound that is obtained once again us- 
ing the Dobrushin comparison theorem. We conclude by putting together all these 
ingredients in section 4.5 to obtain a bound on the bias of the form 



for J C K (Theorem 4.14). This proves the first half of Theorem 2.1 (note that, 
as the bias does not depend on the random sampling in the block particle filtering 
algorithm, we can trivially replace ||vr^ — vTnll J by |||7r^ — vrji^lll j in this bound). 

3.4. Bounding the variance: the computation tree. To bound the variance term 
lll'^n ~ ''''nlll J' once, again start from the basic enw decomposition 



The difficulties encountered in controlling this expression are quite different in 
nature, however, than what was needed to control the bias term. 

Dimension-free bounds on the bias exploit decay of coixelations: the core dif- 
ficulty is to obtain local control of the error inside the blocks. The variance term, 
on the other hand, will already grow exponentially in the size of the blocks due to 
the exponential dependence of the sampling error on the dimension of the obser- 
vations. There is therefore no need bound the error on a finer scale than a single 
block. This makes the analysis of the variance much less delicate than controlling 
the bias, and it is indeed not difficult to obtain a variance bound of the right order 
on a finite time horizon (but growing exponentially in time n). 

The chief difficulty in controlling the variance is to obtain a time-uniform bound. 
Note that, in the error decomposition for the variance term, it is not stability of the 
filter -Kn that enters the picture but rather stability of the block filter Tin. Unlike 
the filter, however, which has by construction an interpretation as the marginal of 
a smoothing distribution, the block filter is defined by a recursive algorithm and 
not as a conditional expectation. It is therefore not entirely obvious how one could 
adapt the approach outlined in section 3.2 to this setting. 

The key idea that will be used to establish stability is that the block filter can 
nonetheless be viewed as the marginal of a suitably defined Markov random field, 
just like the filter can be viewed as the marginal of a smoothing distribution. This 
random field, however, lives on a much larger index set than the original model. 
The basic idea behind the constmction is illustrated in Figure 5 (disregarding the 
observations for simplicity of exposition). When we apply the transition operator 




n 
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Fig 5. For a linear spatial graph G partitioned into blocks A-E (with r — 1), the dependencies 
between the blocks at subsequent times are illustrated here. The left dependency graph represents 
6*^?^^, the right graph represents B'~"PBP/^. The blocking operation unravels the original graph 
into a tree by introducing independent duplicates (dotted boxes) of blocks in the previous time step. 



P, each block interacts with its Ajc neighbors in the previous time step. However, if 
we subsequently apply the blocking operator B, then each block is replaced by an 
independent copy. This could be modelled equivalently by introducing independent 
duplicates of the blocks in the previous time step, and having each block interact 
with its own set of duplicates. This unravels the original dependency graph into a 
tree. By iterating this process, we can express the block filter as the marginal of a 
Markov random field defined on a tree that contains many independent duplicates 
of each block. We call this construction the computation tree in analogy with a 
similar notion that arises in the analysis of belief propagation algorithms [15]. 

With this construction in place, we can now obtain a stability bound for the 
block filter by applying the Dobrushin comparison theorem to the computation 
tree. This will be done in section 4.6 to obtain a bound of the following form: 
provided e > Eq, there exist /3, /5' > (depending only on A, Ajc, r) such that 

max ||F„ • • • F^+iu - F„ • • • ^s+M\k < e^'l^^l^^e-^^*"-^) max 11 u^' - 

for any pair of initial conditions of product form /i = ^xex ^ ~ ^Kex 
(cf. Corollary 4. 18). Combining this bound with the error decomposition, we obtain 
in section 4.7 a time-uniform bound on the variance term of the form 

/3'|3<:|oo 



6' 

max IIItt^ — Traill < C — ^ 



where we bound the one-step enor in the same spirit as the computation for the 
bootstrap particle filter in section 1 . 1 (however, a more involved argument is needed 
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here to surmount the fact that the block filter stability bound is given in a total 
variation norm rather than the weaker norm |||-|||^). Thus Theorem 2.1 is proved. 

Remark 3.3. The reason we must consider stability of the block filter is that 
we have first split the error into the bias |||7ri^ — tt^WIj and variance |||7r^ — 7r^||| j 
parts, and then applied the error decomposition to each term separately. One might 
hope to circumvent the problem by applying the en^or decomposition directly to 
the total eiTor |||7r^ — 7r^||| j as was illustrated in section 3.1, and then splitting the 
one-step error terms in this bound into bias and variance parts: 

|||F„ • • • f s+iF sTT^^^i - F„ • • • Fs+iFs7r^_]^|||^ 

< III Fn • • • Fs+i Fs7r^„]^ - F„ • • • Fs+i Fs7r^_i ||| j 

+ |||Fn • • • Fs+l^sTTg-i - F„ • • • fs+l^sT^s-llWj- 

In this case, only stability of the filter is needed to control the error accumulation. 

Unfortunately, using this simpler approach it is impossible to obtain a nontrivial 
bound on the bias. Indeed, to control the one-step bias Dt,(Fs^, Fg^), it is essential 
that IjL satisfies a decay of correlations property. In section 3.3, the error decom- 
position required us to obtain such a bound for fi = vfj'_^, and we showed that 
the block filter does indeed possess the requisite decay of correlations property. 
On the other hand, if we apply the enor decomposition to the total error as above, 
one would have to obtain such a bound for /u = 7r^_^. This is impossible, as Ttg_i 
cannot possess a useful decay of correlations property within the blocks. 

To see this, consider what happens when we apply the Dobrushin comparison 
theorem to an empirical measure p = jj X^fcLi with rc^ i.i.d. ~ v. Suppose that 
u = (^jgj for some (nonatomic) measures v^: this is the extreme case where 
u has no spatial correlations at all. Nonetheless, the empirical measure p will be 
maximally coixelated: as each is distinct with unit probability, we obtain p*. = 
for every x G {xi, . . . ,xn}, so that Cij = 1 for every i ^ j in Theorem 3.1. 
We therefore see that sampling destroys decay of correlations (this is, in essence, 
the same phenomenon that causes the curse of dimensionality of particle filters). 
For this reason, it is essential to consider the bias and variance terms separately. 

4. Proof of Theorem 2.1. Theorem 2.1 yields a bound on |||7r^ — 7r^^||| j. As 

IIIttA' — #/^||| < lllTr'^ — •S-'^lll -I- lll^i-/^ _ 

IIFn ^n\\\j \\\^n ^n\\\j ^ \\\^n ^nllljJ 

it suffices to bound each term in this inequality. As was explained in section 3.1, 
the first term quantifies the bias of the block particle filter, while the second term 
quantifies the variance of the random sampling. The bias term will be bounded in 
Theorem 4.14 below, while the variance will be bounded in Theorem 4.23. The 
combination of these two results immediately yields Theorem 2. 1 . 
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4.1. Preliminary lemmas. The Dobrushin comparison method introduced in 
section 3.2 is the main workhorse of our proof. To use this method, we must be 
able to bound the quantities Cij, bj, and Dij that appear in Theorem 3.1. The goal 
of this preliminaiy section is to collect some elementary lemmas for this purpose. 

We start with a rather trivial lemma that will be used to bound Qj . 

Lemma 4.1. Let probability measures i/, z/',7,7' and e > be such that 
J^(^) ^ £7(^) ^fiii '^'i^) — (A) for every measurable set A. Then 

||z^-z^'|| <2(l-e)+e||7-7'||. 

In particular, if'y = 7', then — < 2(1 — e). 

Proof. As = {I — £)~^{u — e^) and fi' = (l — e)~^(z^' — £7') are probability 
measures and v — u' = {1 — e){fi — fi') + 5(7 — 7'), the result follows readily. □ 

Next, we state a simple lemma on the distance between weighted measures. We 
have already used this result in section 1.1 to bound |||C„/3 — C.„/3'|||. 

Lemma 4.2. Let fj,, v be (possibly random) probability measures and let A be 
a bounded and strictly positive measurable function. Define 

JlA{x)A{x)fi{dx) J lA{x)A{x)u{dx) 

1^a{A) := - — p . , . . , t^a{A) ~ 



/A(x)/i(dx) ' ' • fA{x)u{dx) 

Then 

The same conclusion holds if the \\ ■ \\-nonn is replaced by the \\\-\\\-norm. 
Proof. The result follows readily from the identity 



^a(/) - j^a(/) 



MfA) - uifA)} + ^{KA) - m(A)} 



/x(A) 

using the definition of the norms || • || or |||-|||. □ 

Finally, we give a lemma that will be essential for bounding Dij. In essence, 
the lemma states that if Cij decays exponentially in the distance between i and j 
at a sufficiently rapid rate, then Dij will also decay exponentially in the distance 
between i and j. This is essential in order to establish the decay of correlations 
property using only bounds on Cij , which can be obtained in explicit form. While 
the lemma should be interpreted in the spirit of decay of conelations, it is essen- 
tially a simple lemma about matrices and will be stated as such. 
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Lemma 4.3. Let I be a finite set and let m be a pseudometric on I. Let C 
{Cij)ij^i be a matrix with nonnegative entries. Suppose that 

maxV e'"(*'^)Ci,- < c < 1. 
is/ ^ 

Then the matrix D = X^„>q C" satisfies 

maxV e'^^^'^'^A, < ■ 
i&i ^ ^ - 1-c 

In particular, this implies that 

E -m{i,J) 
^ - 1 - c 

for every J L 

Proof. Define for any matrix A with nonnegative entries the norm 

\\A\\m :=max^e"'(*'J'Uij. 



Using m{i,j) < m{i, k) + m{k,j), we compute 

\\AB\\m = max J] e'"^^'^^) J] AkB^j 

< max J^e^^^'-'^Uifc ^ e^^^^'^^S^j 

< W AW II Rll 

— ll^li™il-'-^li™) 

so II is a matrix norm. Therefore, 

iii?iu<Eii^ii™^E^" = r^- 

n>0 n>0 

As 

e-M)^A,, < 5^e™(^'^')^,, < \\A\U 
the last statement of the lemma follows immediately. □ 
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4.2. Local stability of the filter. The main goal of this section is to prove a 
local stability bound for the nonlinear filter. We begin, however, by introducing a 
number of objects that will appear several times in the sequel. 

For any probability measure on X and x, z G X, u G y, we define 

(recall the notation fi^ := P^[X^' G • \Xq''^''^ = x^\^''^ in section 3.2). Let 
Cvv'-=l^^P sup \\l^l,z - l^l,z\\ 

for v,v' G V. The quantity 

Corr(/i,^) := max ^ e^'^^'^'^^'^C^^, 

could be viewed as a measure of the degree of correlation decay of the measure 
fi at rate /? > 0. It will turn out that this (not entirely obvious) measure of decay 
of correlations is precisely tuned to the needs of the proof of Theorem 2.1. This is 
due to the fact that the measures /i^ ^ arise naturally when applying the Dobrushin 
comparison method to the smoothing distributions as discussed in section 3.2. 

Proposition 4.4 (Local filter stability). Suppose there exists e > such that 

£ <p''{x,z'") <£-^ for ally £V, x,z €X. 

Let fijU be probability measures on X, and suppose that 

Corr(/i,/3)+3(l-e2A)e2/3.^2< 1 

for a sufficiently small constant (3 > 0. Then we have 
||Fn • • • Fs+i/_i — F„ • • • Fs+il/|| J 



<2e-/5(""«)ymaxe-'^'^(^'^'') sup 
for every J C 1/ and s < n. 



. ... <z\ 
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Remark 4.5. There is nothing magical about the constant 1/2 in the decay 
of correlations assumption; any constant c < 1 would work at the expense of a 
constant 1/(1 — c) rather than 2 in the filter stability bound. As our methods are not 
expected to yield tight quantitative bounds, we have taken the liberty to fix various 
constants of this sort throughout the following sections for aesthetic purposes. 



Remark 4.6. Note that by Lemma 4.2 

2 

e 



H'x,z '^x,z\\ — 2 A ll^a; '^x 



This yields a slightly cleaner bound in Proposition 4.4 with a worse constant. For 
our purposes, however, it will be just as easy to bound ||/^^ ^ — ^ II directly. 

Proof. Define the smoothing distributions 

p = P^[Xo,...,X„ G ■\Yi,...,Yn], 
p = P^[Xo,...,X„G ■\Yi,...,Yn]. 

We will apply Theorem 3.1 to p, p with I = {0, . . . ,n} x V and S = X""*^^ as 
discussed in 3.2. To this end, we must bound the quantities dj and bj. We begin 
by bounding Cij with i = {k, v) and j = {k', v'). We distinguish three cases. 

Case /c = 0. The key observation in this case is that = Pxo,xi by the Markov 
property (or by direct computation). Note that as card N{v) < A, we have 

. /iA(^-)n..^(.)pn^,^-)/^^(^x-) ^ 

so z ~ f^x z'W — '^(^ ~ ^^'^) z' by Lemma 4.1. Therefore 



Cij < ( 



This evidently implies that 



2A 



l-e^-^ ifk' = landv'£N{v), 
otherwise. 



E e/^'='e/^'^(^'^')C(o,.)(.,y) < Corr{p, /3) + (1 - e^^)e^(^+'^ A. 
{k',v')ei 

Case < A; < ?i. Now we have (cf. section 3.2) 

JUixl)p^ix,.^,xl)g^xlY,^)U^^^i.)P'"ixk,x-,^,)ridxl) 



PxiA) 



Jp-{xk-i,xl) g^{xl,Y-) n^6^(„) p-(xfc, r{dxl) 
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By inspection, p\ does not depend on x"^, except in the following cases: k' = k — 1 
and v' G N{vy, k' = k + landv' e N{vy, k' = k and v' G U^gAr{^) N{w). As 



as well as 



Jp-{xk-i, xl) g- {xl , ) r {dxl ) 



2 / 1^(4) g^ixl, Y,^) n.eiVM P'"{^k,x^^,) ridxl) 



we can use Lemma 4. 1 to estimate 



1 




if A;' = - 1 and v' G A^(?;), 


1 




if A:' = A; + 1 and G iVC?;), 


1 




if /c' = A; and z;' G U«,e7V(.) ^("^ 




V. 




otherwise. 



This yields 

(fc',-u')g/ 

< 3(1 - e2A)e2/3r^2^ 

where we have used that r > 1 and A > 1 in the last inequahty. 
Case k = n. Now we have 



J Uixl) p\x^_^,xl) g^jxl, Y-) rjdxl) 

Jp-ixn-i,x-Jg-(x-,Y-)ridxl) 
,JlA{xl)gAxl,Y:)r{dxl) 



Jg-{x-^,Y-)r{dx- 
and we obtain precisely as above 



Cij < 



1 - ^2 if fc' = n - 1 and G N{v), 
otherwise. 



We therefore find 



ik',v')ei 
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Combining the above three cases and the assumption of the Proposition yields 



max 



^ ' ik',v')£l 

Thus Lemma 4.3 gives 

^ "'"'^ {k',v')ei 

Now consider the quantities bj in Theorem 3.1. By the Markov property, it is evi- 
dent that pI; = p], whenever i = {k, v) with k > I. On the other hand, for A; = 
we obtain p]. = and p\ = i^xqxi- Applying Theorem 3.1 therefore yields 

Ikn - KWj = \\P - P\\{n}xJ ^ E E ^{n,v){0,v') SUp \\pl^^ - U^J]. 



vGJ v'GV 

However, note that 

E ^{n,v)(0,v') sup WpI^, 



I 



= e-^- E ^''^"^'^''''''^^^(n,.)(o,.') e-'^'^^"'"') sup - <z 



v'ev 



< 2e-'5"maxe-'^'^('''''') sup Wp^, - 

using the above estimate on the matrix D. Substituting this into the bound for 
IK" ~ "^nll J yields the statement of the Proposition for the special case s = 0. 

To obtain the result for any s < n, note that F„ • • • ^s+iP and 7r^„s differ only 
in that a different sequence of observations (Kj+i, . . . ,Yn versus li, . . . , Yn-s) is 
used in the computation of these quantities. As our bound holds uniformly in the 
observation sequence, however, the general result follows immediately. □ 

As a corollary of Proposition 4.4, let us derive a simple filter stability statement 
that illustrates the role of decay of correlations (this will not be used elsewhere). 

Corollary 4.7 (Filter stability). Suppose there exists e > such that 

e <p"{x,z'') <e-^ forallv £V, x,z £X, 

and such that 

e > eo = (l 



I \1/2A 



6A2 
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Then for any probability measures on\ and J V, n > 0, we have 

IK-<||J<4cardJ7"/'^ 
where 7 = 6A'^{1 - e^^) < 1. 

Proof. We first apply Proposition 4.4 with fi = 6x- Then Corr(^, /3) = for 
any /3 > 0. Choosing /3 = — (2r)~^ log 7 > 0, we find that 

Corr(/x, /3) + 3(1 - e2A)e2/3r-^2 ^ 1 ^ 
so that the assumption of Proposition 4.4 is satisfied. Therefore, 

IK -<||j < 4 card J e"^*^ = 4card J7"/2'-. 
To obtain the result for arbitrary fi, note that 

7T^{A)=P^'[XneA\Yi,...,Yn\ 

= E^[P^[X„ E A\Xo,Yi, . . . ,Yn]\Yi, . . . , Yn] 

= E^[^^°(A)|yi,...,y„]. 

Therefore, by Jensen's inequality, 

IK - <|| J < Eq||7^^^ - <|| j|yi, . . . , y„] < sup IK - <|| J, 

which yields the result. □ 



While Proposition 4.4 requires a decay of correlations assumption on the ini- 
tial condition (Corr(/i, /3) must be sufficiently small). Corollary 4.7 works for any 
initial condition provided that e > eq is sufficiently large (which is necessary in 
general, see section 2.3.1). Thus no assumption is needed on the initial condition 
if we want to show only that the filter is stable in time. On the other hand. Propo- 
sition 4.4 controls not only the stability in time, but also the spatial accumulation 
of eiTor between /j, and by virtue of the damping factor e~^'^^'"''" ) : the decay of 
correlations property of the initial condition is essential to obtain this type of local 
control. The latter is of central importance if we wish to obtain local error bounds 
for filter approximations that are uniform in time and in the model dimension. 
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4.3. The block projection error. The proof of a time-unifomi error bound be- 
tween -Kn and Tin requires two ingredients: we need the filter stability property of 
■Kn, developed in the previous section, in order to mitigate the accumulation of 
approximation errors over time; and we need to control the approximation eiTor 
between vTn and vfn in one time step. The latter is the purpose of this section. 

We will in fact consider two separate cases. To control the total error 1 1 vr^ — vf n 1 1 j, 
we need to consider the one-step error made in each time step s = 1, . . . , n. For 
time steps s < n (for which the error is dissipated by the stability of the filter), the 
eiTor must be measured in terms of the quantities that appear- in Proposition 4.4: 
that is, we must control IKF^i/)^ ^ — {fgi')^ On the other hand, in the last time 
step s = n, we must control directly ||F„z^ — F„i/||j. While the proofs of these 
cases are quite similar, each much be considered separately in the following. 

We begin by bounding the eixor in time steps s < n. 

Proposition 4.8 (Block eiTor, s < n). Suppose there exists e > such that 

e </(j;,z") < e"^ for all v £ V, x, z £X. 
Let V be a probability measure on X, and suppose that 

Corr(i/,/3) + (1 - e2)e^('^+i)A < ^ 

for a sufficiently small constant /3 > 0. Then we have 

sup ||(F.<,-(F,<J| <4e-^(l-e2A)e-M^.9^) 

for every s E N, -fT G 3C and v € K. 

This result makes precise the idea that was heuristically expressed in section 2.2: 
if the measure i/ possesses the decay of correlations property, then the error at site v 
incurred by applying the block filter rather than the true filter decays exponentially 
in the distance between v and the boundary of the block that it is in. 

Proof. We begin by writing out the definitions 

!U{x)\{i,,^^ [!Y[^^i,,p^{xo,x-')g^{x--,Ynu{dx,)\ j^jdx) 
IIlK'ex [IUn.eK'P'"i^o,x-')g^{x-',Y,-')u{dxo)] Hdx) 



it .,\l A\ 
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Let us fix K ^ %, V ^ K throughout the proof. Then 



Define / = ({0} x 1/) U (1, and S = X x X'', and the probabihty measures on S 

p{A) = 

J 1a(xo, X-) g^{x\Y^) n^gv.p"'(xo, x"') n.g^(.) ^{dxp) rjdx^ 

Jg-{x-,Y-) U^nevP'^i^o, x-') n„giv(.) ^(rf^o) r (d^^) 
~p{A) = 

J lAixo,x-) g^{x\Y-) n^g^ p"(xo, X"') n.gjv(.) K^^^p) Vjdx^ 

fg-{x-,Y-) n^gi^ ^'"(^o, x^) n„gjv{.) P'^i^^ ^(rf^o) V'^d^") 

Then we have by construction 

ll(Fs<.-(F.<J| = ||p-p||(M). 

We will apply Theorem 3.1 to bound \\p — ,;). To this end, we must bound Cij 
and bi with i = {k' , v') and j = {k", v"). We distinguish two cases. 
Case k' = 0. In this case we have 

, ^ J 1a {xt ) n^g7V(.o (^0 . x^ < (dx^' ) 

''^^"'^"^^ ^ /n.g^K)^''"(^o,x-)<(dx?^') ' 

''^^"'^"^^ ^ !\[^eNi.')nKP'"{^o,x^)<{dxt) ' 

In particular, /0(a,(, j.^) = ^^0,^' ^° - ^v'v" ^" = 0. Moreover, as 

, 2 / lA(xg') n^giVK)\W P"(^0. <(rf^o') 

''^^"^"^^ /n.g^K)\w?^"(-o,x-)<(do ' 

we have Cij < 1 — if A;" = 1 (so i;" = v) and G A^(f') by Lemma 4.1, and 
Cij = otherwise. We therefore immediately obtain the estimate 

^ e^'="e/^'^(^''^")C7(o,.,)(,.,„„) < Corr(z.,/3) + (1 - e2)e/^(^+i). 

{k",v")(il 
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On the other hand, note that /9*^^^ = /o*^^^ if N{v') C K, and that we have 
pIxo,x-) ^ ^^^^x'o and p\^g^^v) > e^^vi^- Therefore, by Lemma 4.1 

, _ II i II . Jo fort-' e 

(a;o,x")e§ [2(1 -e^"^) Otherwise. 

Case A;' = 1. In this case we have 

Thus 6j = 0, and estimating as above we obtain Cij < 1 — e"^ whenever k" = 
and v" G N{v), and Cy = otherwise. In particular, we obtain 

(k",v")ei 

Combining the above two cases and the assumption of the Proposition yields 

mc^ ^I3{\k'-k"\+d{v',v")}r< ^ ^ 

max > e '-' m \ > " Uiu .,i\(ui <~ —. 

^ ' {k",v")£l 

Applying Theorem 3.1 and Lemma 4.3 gives 

< 2(1 - e^^) E -O{l,i;)(0,j;') 
v'£V\{K\dK) 

<4e-'^(l-e2A)e-M-,9i^). 
As the choice of x, z G X was arbitrary, the proof is complete. □ 
We now use a similar argument to bound the error in time step n. 
Proposition 4.9 (Block eiTor, s = n). Suppose there exists e > such that 

e <p"{x,z'') <£~^ forallv£V,x,z£X. 
Let u be a probability measure on X, and suppose that 

CorT{u,/3) + (1 - e2)e^('^+^)A < ^ 
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for a sufficiently small constant /3 > 0. Then we have 

||F„z.- F„z.||j < 46-^^(1 -e2A)g-MJ,ai^) ^^^^j 

for every K £ % and J K. 

Proof. Define I = {0,1} xV and S = X^. Fix K e %, and let 



P{A) 



I Uvsv P"(^o, x\) g''{xl,Y-) u{dxo) i;{dxi) 



/ UveKP^'i^o, x\) n^gy g'"ixr,Y;f) v{dxo) Hdxi) 
Then for any J (1 K,^e. have 

||F„z^- F„z^||j = ||/5-p||{i}xj- 

We will apply Theorem 3.1 to bound \\p — p\\[i}xj- To this end, we must bound 
Cij and hi with i = {k, v) and j = {k', v'). We distinguish two cases. 
Case A; = 0. In this case we have 



!W.,^N(v)P'"i^O,X^)vl^{dxl) ' 

/ IaK) n.eiV(.)nKP"(^o, x^) vl^{dxl) 



!W^^N(v)nKP'"{^O.X^)ul^^{dxl) 

In particular, = i^xo,xi' ^ij < C^„, if k' = 0. Moreover, as 



pUA) > e 



2 JlAK)n^g^(.)\K}P"(^o,<X(^x^) 

I Ilw€Niv)\{v'} P"" i^O, Xf) I'^gidx^) 



we have Cij < 1 — if k' = I and v' G A^(?;) by Lemma 4.1, and Cjj = 
otherwise. We therefore immediately obtain the estimate 

Y: e^''e^"^^'^'^C^o,.W,.') < Corr(., /3) + (1 - e')ef^^^+'^ A. 
{k',v')ei 

On the other hand, note that = pi. if A^(t;) C A', and that we have pi- > e'^^uJ^^ 
and pI > 

^"^^^XQ - Therefore, we obtain by Lemma 4.1 

„ i ~iu ^ fo ^orve K\dK, 

bi = snp\\px- pj< i . 
xe§ I 2(1 - e^^) otherwise. 
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Case k = 1. In this case we have 

^ Jp^ixo,x\)g-ixl,Y-)r{dxl) ' 

while pf = pfifv^K and 

~^ ^ flA{x\)g-{x\,Y;:)r{dx\) 
^ !gHx\,Y-)r{dx\) ' 

otherwise. Thus we obtain from Lemma 4. 1 

II * -Ml / Jo for^;Gi^, 
bi = sup \\p^ - pj < < . 
xg§ 12(1 — e^j otherwise. 

On the other hand, we can readily estimate as above 

Combining the above two cases and the assumption of the Proposition yields 

^ ' ' (k',v')ei 
Applying Theorem 3.1 and Lemma 4.3 gives 

\\fnV - fnt^h = \\P - P||{l}xJ 

< 2(1 - e^'^) E \ E D{i,v)io,v') + E f 

<4e-''(l-e2A)e-M^'9^)cardJ 

for every J (1 K. □ 

4.4. Decay of correlations of the block filter. To idea behind the block filter 
Tin is that the error should decay exponentially in the block size by virtue of the 
decay of coiTclations property. While we have developed above the two ingredients 
(filter stability and one-step error bound) required to obtain a time-uniform error 
bound between vr^ and fr^, we have done this by imposing the decay of correlations 
property as an assumption. Thus perhaps the crucial point remains to be proved: 
we must show that decay of coixelations does indeed hold, that is, that Corr(7r^, /3) 
can be controlled uniformly in time. This is the goal of the present section. 
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Unfortunately, Corr(7f^^, /3) is not straightforward to control directly. We there- 
fore introduce an alternative measure of correlation decay that will be easier to 
control. For any probability measure /i on X and x, z X, v V, K J(.,let 

We now define 

(^^'^ 9 max sup sup ll/^xU-%JI 

for G y. The quantity 

C^(m,/3) := max ^ e'^'^'^^'^'^C'^;, 

is a measure of correlation decay that is well adapted to the block filter. In order 
for this quantity to be useful, we must first show that it controls Corr(/i, /3). 

Lemma 4.10. For any probability measure /i and f3 > 0, we have 

Corr(/i, /?) < (1 - e2^)e2^^A2 + 2e~'^^ Chfiifi, /3). 

Proof. By definition 

Let X, X G X be such that x^^'t'^'^ = x^\'t^'}. If v' Ut„e7V(i>) ^('^)' *en 

IIA^XjZ ~ A*X,zll — ~ i^X,Z II 

by Lemma 4.2. On the other hand, note that 

y^lM) > e'^/i^;f (^), f^lM) > e'^/^i'J (^)- 
We can therefore estimate using Lemma 4.1 for v' € [jweN{v) ^{w) 

h% - i^lj < 2(1 - e'^) + e'^Mf - nl'^l 
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Thus we obtain 

Corr(/i, /3) < (1 - e^^) max V e'^'^^"'""' + 2e-^^ C^r{fi, /3) 

vGV ^ — ' 

'''6U„GiV(.)^(«') 

< (1 _ e2^)e2/5^A2 + 2e-2^ C^(^, 
As and /3 were arbitrary, the proof is complete. □ 

We now aim to estabhsh a time-uniform bound on Corr(7fn, /3). To this end, we 
first prove a one-step bound which will subsequently be iterated. 

Proposition 4.11. Suppose there exists e > such that 

e<p'"{x,z'') <e-^ for ally £V, x,z £X. 
Let V be a probability measure on X, and suppose that 

C^y{v, /3) + (1 - e2)e/3('^+i) A < ^ 
for a sufficiently small constant /3 > 0. Then we have 

C^(F,i/,/3) < 2(1 - e2A)e2/3r^2 

for any s E N. 

Proof. Let K,K' eX,v & K,v' (v' ^ v), and let z,x,x £X such that 
j.v\{v } _ ^v\{v } These choices will be fixed until further notice. 
Define / = ({0} xV)U (1, t;) and § = X x X'', and let 

p{A) = 
p(A) = 

Then we have by construction 

IKM^'f -(Ml;f || = ||/>-p||(i,„). 

We will apply Theorem 3.1 to bound \\p — p||(i ,;). To this end, we must bound Cij 
and bi with i = {k, t) and j = {k', t'). We distinguish two cases. 
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Case fc = 0. In this case we have 



Note that p*^^^ = Ux^x- We therefore have Cij < C^^, when k' = 0. Moreover, 
i (A\> 2 ■/'■'■^(^o)nioGAr(t)n(i<:\{i;})^''"(^o,a;"')^'*Q((ixo) 

imphes Cij < 1 — if k' = 1 and v £ N{t) by Lemma 4. 1, and Cij = otherwise. 
On the other hand, note that as x^^^^''^ = x^\^^'^ we have pi „s = pi „s if 
v' iV(t) n K, while both and dominate 



/n 

Therefore, by Lemma 4. 1 

(o,t) < j2(i _£2) otherwise. 
Case A: = 1. In this case we have 

^ / Ujx^ g^{x\Y-)p-{xo,x-) UueNiv)nK' P^^, V^n^^") 

Estimating as above, we obtain Cij < 1 — whenever k' = and G N{v), and 
Cjj = otherwise. Similarly, arguing again as above, we obtain 



< 



for^^' 0U»g7V(t;)ni^'^H> 

2(1 - e^A) otherwise. 



Define the matrix {Cjj(f )}ijg/ with the following entries: 

C{0,t)(l,v){v) = C(i^^)(o,t)(u) = (1 - e^)liG7V(t,), 
C(l,j;){l,ti)(^^) = 0. 
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Combining the above two cases yields Cij < Cij{v), and we readily compute 

E e'^^'-''^^'^'''^^C^km',t'){v) < C^r(.,/3) + (1 - e2)e^('-+^)A < \ 

(k',t')&I 

where we have used the assumption of the Proposition. By Theorem 3.1 

IKM^f -(M^f ll = llp-p||(M) 

< 2(1 - e^) l^/eft: ^ ^(i,,;)(o,i')('^) 
t'eN(v') 

+ 2(1 - e^A) N(w) Dii,y)(i,v) (v) 

where D{v) := J2n>o C{v)'^. But note that the right-hand side does not depend 
on K' or z, x, x (provided x^\^'"'^ = x^^^^' J^). We therefore obtain 

t'eN{v') 

for every K £ X,v £ K, and v' £ V. 
To proceed, we note that 

Y: e^'^^-'^ctr < (1 - e^) E ^''^''-'^ E ^(M)(o,n(-) 
■u'gv t>'G-ft: t'eN{v') 

< (1 - e2^)e2^^'A2 E 

Applying Lemma 4.3 to C{v) yields the result. □ 
We now iterate the above result. 

Corollary 4.12. Suppose there exists e > such that 

£ <p"{x,z") < forallv£V,x,z£X, 

and such that 

e > eo = f 1 



I \ 1/2A 



16A2 
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Let fj, be a probability measure on X such that 

C^(^,/3) < \, 
where (3 = -(2r)-i log 16A2(1 - e"^^) > 0. Then 

/3) < - for all n > 0. 

8 

In particular, the latter holds whenever fi = 6x for any x G X. 
Proof. The assumption e > £o implies /3 > and 

(1 _ e2)g/3(r+l)^ < 1^ 

^ ' - 16 

Therefore, if Corr(i/, /?) < 1/8, then Proposition 4.11 yields 

C^(F,i/,/3) < 2(1 - e2A)e2/3-A2 < 1. 

8 

Thus if Corr(^,/3) < 1/8, then Corr(7r^,/3) < 1/8 for all n > 0. Moreover, as 
Corr{6x, P) = 0, the result hold automatically for fi = 6x- □ 

We finally obtain the requisite bound on Corr(7r^^, /3) using Lemma 4.10. 

Corollary 4.13 (Decay of coixelations). Suppose there exists e > with 

e < p"{x, z") < for all v G V, x, z E X, 

such that 

I \ 1/2A 
I6A2J 

Let /3 = -(2r)-^ log 16A2(1 - e"^^) > 0. TJien 

Corr(7f^,/3)<i 

for every n > and x G X. 

Proof. By Corollary 4. 12 and Lemma 4. 10, we can estimate 

Corr(vf;?,/3)<l + ie"2A< 1 

where we used that e^^ > 1 - 1/16. □ 



e > eo 
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4.5. Bounding the bias. In the previous sections, we have proved a local filter 
stability bound (Proposition 4.4), a local one-step error bound (Propositions 4.8 
and 4.9), and decay of correlations of the block filter (Corollary 4.13). We can now 
combine these results to obtain a time-unifoiTn enor bound between the filter and 
the block filter; this contiols the bias of the block paiticle filtering algorithm. 

Theorem 4.14 (Bias term). Suppose there exists e > such that 
e <p"{x,z'") <e-^ forallv £V, x,z £X, 

and such that 

e > eo = I 1 



V 18A2 

Let^ = -(2r)-Mogl8A2(l - e^^) > 0. Then 

IK - < (1 - ^'^) card Je-/^'^(^'^^) 

for every n>0, xGX, and J C K. 

Proof. We begin with the elementary error decomposition 

n 

\\K - ^n\\j <^\\^n--- " F„ • • • f sT^s-l\\j ■ 

s=l 

We will bound each term in the sum. 

Case s = n. To bound this term, note that 

Corr(7f^i,/3) + (1 - e')e^^^+'^A < i + 1 < i 
by CoroUary 4.13. Therefore, applying Proposition 4.9 with u = tt^_i, we obtain 
\\fnK-i - Fn^n-illj < 46"^ (1 - ff^A) g-M'^.e^) card J. 
Case s < n. To bound this term, note that by Corollary 4.13 

Corr(v^,^/?) + 3(l-.2^)e2/^^A2<i + i = i 
Applying Proposition 4.4 with /x = vff and u = Fsfrjl^ yields 
II F„, • • • F,^i Fs7f?„i — Fn. • • • F,^i Fcvf^ 



<2e-/^("-)J]maxe-^'^('''''') sup \\{fs^L^)t - i^s^^s-i) 



V I 



x^zd 
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On the other hand, as by Corollary 4.13 



Corr(7f,-_i,/3) + (1 - e')e^(^-+'^A < 1 + 1 < i 
we have by Proposition 4.8 with u = TTg_i 

sup \\{fs^U)t - i^s^U)t\\ < - e'^) e-^'^^^^'l 

We therefore obtain the estimate 

||F„ • • • Fs+iFs7r^_i - F„ • • • Fs+iFs7f^_i|| J 

< 8e-^il - e^A) e-/3{"--)e-^'^(-^'^^) card J, 

where we have used d{v, v') + d{v' , dK) > d{v, dK). 

Substituting the above two cases into the error decomposition and summing the 
geometric series yields the statement of the Theorem. □ 

4.6. Local stability of the block filter. As was explained in section 3.4, the chief 
difficulty in obtaining a time-uniform bound on the variance term is to establish 
stability of the block filter. This will be done in the present section. 

We first establish a stability bound for nonrandom initial conditions. 

Proposition 4.15. Suppose there exists e > such that 

£ < p"{x, z") < e"^ for all v ^V, x,z £X, 

and such that 



e > eo = 1 



I n1/2A 



6A2 

Let 13 = - log6A2(l - £2^^) > 0. Then 

||F„ • • • F,+i5, - F„ • • • F,+i5,,||j < 4 card J e-^^'^-^) 
for every s < n, z, z' £ \, K £ %, and J C K. 

Proof. Fix throughout the proof n > 0, K £ %, and J O K. We will also 
assume throughout the proof for notational simplicity that s = (the ultimate 
conclusion will extend to any s < n as in the proof of Proposition 4.4). 

We begin by constructing the computation tree as explained in section 3.4. For 
future reference, let us work first in the more general setting where the initial distri- 
butions fi = <S>K'eiX /^^^ ^^'^ ^ ~ ^K'ex independent across the blocks 
(rather than the special case of point masses 8x and 5^/). Define for K' £% 

N{K') = {K" G % : d{K', K") < r}, 
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that is, N{K') is the collection of blocks that interact with block K' in one step of 
the dynamics (recall that card N{K') < Ajc). Then we can evidently write 

B^^^'F./. = CfP^' (g) 

K"eN(K') 

where we have defined for any probability rj on X^' 



and for any probability r? on -^iC'dNiK') ^" 



v&K' 



We therefore have 



B^F„---FiM 



i^„_ie7V(K) 



(~K„-2pK„-2 



KieN{K2) 



KoeN{Ki) 



The structure of the computation tree is now readily visible in this expression. To 
formalize the constmction, we introduce the tree index set 



T := {[Ku ■ ■ ■ Kn-i] : < n < n, e N{Ks+i) for u < s < n} U {[0]} 

where we write Kn '■= K for simplicity (recall that K and n are fixed throughout). 
The root of the tree [0] represents the block K at time n, while [Ku • • • ^n-i] rep- 
resents the duplicate of block Ku at time u that affects block K at time n along the 
branch Ku — >■ Ku+i —)••••—)■ Kn~i — )• K (cf. Figure 5 for a simple illustration). 
The vertex set coiTcsponding to the computation tree is defined as 

/ = {[Ku • • • Kn-i]v -.[Ku--- Kn-i] eT,ve Ku} U{[0]v :veK}, 

and the corresponding state space is given by 

S = JJX\ X^*^" =X" for [t]vel. 
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It will be convenient in the sequel to introduce some additional notation. First, we 
will specify the children c(i) of an index i G / as follows: 

c{[Ku ■ ■ ■ Kn-i]v) := {\Ku-i ■ ■ ■ Kn-i]v' : K^-i G N{K^), v' E N{v)}, 

and similarly for c([0]t;). Denote the depth d{i) and location v{i) of i G / as 

d{[Ku---Kn-i]v):=u, d{[0]v) := n, v{[t]v):=v. 

We define the index set of non-leaf vertices in I as 

/+ := {i G / : < d{i) < n}, 

and the set of leaves of the tree T as 

To := {[Ko • • • Kn-i] : G N{Ks+i) for < s < n}. 

Finally, it will be natural to identify [t] G T with the corresponding subset of /: 

\Ku--- Kn^i] = {[Ku--- Kn-i]v : V G K^}, 

together with the analogous identification for [0] . 

We now define the probability measures p, p on S as follows: 

p{A) = 
~p{A) = 

Y{^^,^p-^\x^'^'\x')g-^^{x\Y'^l^)r^^{dx') n[i]ero ^^\dx^'^) 

where we write ;/[^o- -^n-i] ._ ^Kq ^[Ko---Kn-i] ._ ^Kq simphcity. Then, 
by construction, the measure B^F^ • • • Fi^ coincides with the marginal of p on the 
root of the computation tree, while B^^ F„ • • • Fii/ coincides with the marginal of p 
on the root of the computation tree. In particular, we obtain 

||F„ • • • Fip - F„ • • • Fii/|| J = \\p - p||[0]j. 

We will use Theorem 3.1 to obtain a bound on this expression. 

Throughout the remainder of the proof, we specialize to the case that p = 6z 
and u = 6z'- To apply Theorem 3.1, we must bound the quantities Cij and bi with 
i = [Ku ■ ■ ■ Kn-i]v and j = [K'^, ■ ■ ■ Kl^_i]v'. We distinguish three cases. 
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Case u = 0. As fi = 6z is nonrandom we evidently have p], = 5z^, so that 
Cij = 0. On the other hand, as p\. = 5z'^, we cannot do better than hi < 2. 
Case < u < n. Now we have 

pliA)=pi{A) = 

J 1a{x^) g\x\Y:) p\x<^\x^) Ueeu-Aecji) p'^^'Kx^'), x') r{dx') 
j g-{x\Y^)p-{x^i^,xi)Y\t^j^,,^^^i^p''^^\x-i^),x^)r{dx') 

Thus bi = 0. Moreover, by inspection, p\ does not depend on x^ except in the 
following cases: j G c{i); i € c{j); j € c{i) for some ^ G /+ such that i G c(£). 
As card c{£) < A for every ^ G /+, we estimate using Lemma 4.1 



Cij < ( 



ifjGc(i), 
ifiG c(j), 

1_^2A ifjGUg,^:.gcWc(^), 

otherwise. 



This yields 



^ ^m{i)-dU)\c,^ < 2(1 _ e2)e^A + (1 - e2A)A2 < 3(1 _ e^^)e^A'' 

where we have used that (3 > and A > 1 in the last inequality. 
Case u = n. Now i = [0]v, so we have 



pUa) = pHA) 



J lAix') g'^ix^Y^) p'^ix^'lx') ^P'^idx' 



J g''{x\Y;^)p^{x<^),x^)'il^''{dx'-) 
Arguing precisely as above, we obtain 6j = and 

Combining the above three cases, we obtain 

maxV e^l'^«-'^(^)la, < 3(1 - e^'^)ef'A'' = - 



by the assumption of the Proposition. Thus by Theorem 3.1 

||F„ • • • fiSz - F„ - • • PiSz'Wj = \\p- p\\[0]j < 4 card J e"^", 

where we have used Lemma 4.3 with m{i,j) = f3\d{i) — d{j)\. The proof is com- 
pleted by extending to general s < n as in the proof of Proposition 4.4. □ 
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The proof of Proposition 4. 15 was simplified by the fact that the resulting bound 
holds uniformly for all point mass initial conditions (this could be used to obtain a 
uniform bound for all initial measures along the same lines as the proof of Corol- 
lary 4.7). To obtain a bound on the variance term, however, we require a more 
precise stability bound for the block filter that provides explicit control in terms of 
the initial conditions. We will shortly deduce such a bound from Proposition 4.15. 
Before we can do so, however, we must prove a refinement of Lemma 4.2. 

Lemma 4.16. Let p = p^ ® ■ ■ ■ ® p'^ and v = ® ■ ■ ■ ® v'^ be product 
probability measures on S = x • • • x S'^, and let K -.^ ^ Wbe a bounded and 
strictly positive measurable function. Define the probability measures 



J lAix)A{x)fi{dx) 



J lAix)A{x)i'{dx) 



fA{x)p{dx) ' ' ' fA{x)u{dx) ■ 

Suppose that there exists a constant e > such that the following holds: for every 
i = 1, . . . ,d, there is a measurable function A* : S — t- M such that 

£A\x) < A{x) < e-'^A'ix) forallxeS 

and such that A*(x) = A*(x) whenever = 5{i>-..,4\{i}. jhen 

„ d 



I /"A 



Proof. Define for i = 0, 



, d the measures 



Pi ■= f 



ly' (g) p 



i+l 



P 



piA^) 



_ / lA{x)A{x)pi{dx) 



jA{x)pi{x) 



(by convention, po = p and pd = v). Then we can estimate 

d 

\\Pk - T^hW < ^ IIP'i.A - Pi~l,A\\- 
1=1 

Now note that we can estimate for | / 1 < 1 

1 



\PiMf) - Pi-1,A(/)I < 



epi{Ai 



p,(/A)-p,_i(/A)| + |p,(A)-p,_i(A)| 



as in the proof of Lemma 4.2. Moreover, we can write 



|p,(/A)-p,_i(/A)| 
\pi{A) - pi_i{A)\ 



e 



£ 



f{xy{dx') 

g\xy{dx') 



f\x)p\dx' 
g\x)p\dx^: 
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where /* and are functions on defined by 

p(x^) :=— ^ / f^x)Kix)u^^dx^)■■■v^^^(dx'~^)^l'+^idx^+^)■■■^l'^(dx'^), 

pii^n J 

g\x') :=— ^ I A{x)u\dx^)---u'-\dx'-^)fi'+\dx'+^)---fi'^{dx'^). 

Evidendy |/*| < 1 and \g^\ < 1, and the proof follows directly. □ 
We can now obtain a stability bound with control on the initial conditions. 
Proposition 4.17. Suppose there exists e > with 

e <p"{x,z'') <e~^ forallveV,x,z€X 

such that 

e > eo = I 1 



6A2 

Let 15 = - log6A2(l - e^^^) > 0. Then for any product probability measures 
we have 

e2|JC|, 



|F^---F,+i/i- F^---F,+i;y||j < card J e ^ aft-H/i^" - z^" 



for every s < n, K £ %, and J (1 K. Here {aK)Kex <^f^ nonnegative integers, 
depending on J and n — s only, such that ^j^^x ^ A 



n—s 

X ■ 



Proof. We fix s = 0, n > 0, A' G IK, J C K as in the proof of Proposition 
4.15, and adopt the notation used there. Define the functions 



:= / lA{x^^^')llp"^'^{x<'\x')g'''^'Hx\Y^lf^)r^^{dx'), 
h{x^^):= I Wp"^\x'^^\x')g-^\x\Yjl^)r^^{dx') 
on the leaves Tq of the computation tree, for every measurable A C X"^. Then 

(Fn---Fi/.)(A) = \ 7 ^ ; = / ^^^^-^dx^o^, 
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where we define the measure 



The measure i) is define analogously, and we have 



|Fn • • • Fi^ - Fn • • • Fil/|| J 



2 sup 



hA 



hA ,~ 
— — di' 
h 



where the supremum is taken only over measurable sets. But note that hj\/h is 
precisely the filter obtained when the initial condition is a point mass on the leaves 
of the computation tree (albeit not with the special duplication pattern induced by 
the unravelling of the original model; however, this was not used in the proof of 
Proposition 4.15). Therefore, the proof of Proposition 4.15 yields 



2 sup sup 



hA{z) hA{z) 



< 4 card J e 



-I3n 



h{z) h{z) 

In particular, using the identity — < \ osc / ||;U — v\\, we obtain 
||Fn • • • Fi/_f — F.„ • • • Fii/||j < 2card Je~^" — 

We now aim to apply Lemma 4.16 to estimate — i>||. 

To this end, consider a block [t\ G Tq. The integrand in the definition of h{x'^°) 
depends only on through the terms p'"^^\x'^^^\ x^) with c(i) n [t] ^ 0. If we 
write [t] = [Kq • • • Kn-i], then c(i) n [t] 7^ requires at least i G [Ki • • • Kn-i] 
and therefore card{z E /+ : c{i) Ci [t] 7^ 0} < card A'l < \X\oo- Thus we have 



for every z G X^o and [t] G Tq, where 



p-^-'i^x''^"' ,x 



fie/+:c(i)n[t]=0 



''')llg^^'^{x\Y^g)r''>{dx') 



does not depend on x' ^. By Lemma 4.16, we obtain 



Ell" 

[t]GTo 



r2|3C|, 



,K'i 



where we define ax' = card{[i^o ■ ■ • ^n-i] G ^0 ■ Kq = K'}. As the compu- 
tation tree has a branching factor of at most Ax, we evidently have J^Kex '^^ ~ 
card To < A!^. The result therefore follows directly for the case s = 0, and the 
general case s < n is immediate as in the proof of Proposition 4.4. □ 
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We finally state the block filter stability bound in its most useful forni. 
Corollary 4. 18 (Block filter stability). Suppose there exists e > with 

e <p"{x,z'') <e-^ forallv £V, x,z £X 

such that 

I \ 1/2A 

Let/3 = - log 6AkA2(1 - e^^) > 0. 

Then for any (possibly random) product probability measures 

we have 

E[||F„ • • • F^+i^ - F„ • • • F^+ii^ll j] ^''^ 

for every s < n, K & %, and J ^ K. 

Proof. The result follows readily from Proposition 4.17 (note that we have 
now absorbed the branching factor AJ~* in the definition of /3). □ 

4.7. Bounding the variance. To complete the proof of Theorem 2.1, it now 
remains to bound the variance term |||7r„ — 7r„||| j uniformly in time. This is the goal 
of the present section. We will first obtain bounds on the one-step error, and then 
combine these with the block filter stability bound of Corollaiy 4.18 to obtain time- 
uniform control of the error. The main remaining difficulty is to properly account 
for the fact that Corollary 4.18 is phrased in teniis of the total variation norm || • || j, 
which is too strong to control the sampling eiTor (we do not know how to prove 
an analogous result to Corollary 4.18 in the weaker |||-||| j-norm). To this end, we 
retain one time step of the block filter dynamics in the one-step error (we control 
IIF^+iF^Trf.i - Fs+iFs7rf_i||i^ rather than |||Fs7r^_i - Fs7rf_i|||^), which allows 
us to exploit the fact that the dynamics P has a density. 

Let us begin with the most trivial result: a one-step bound in the |||-||| j-norm. 
This estimate will be used to bound the error in the last time step s = n. 

Lemma 4.19 (Samphng error, s = n). Suppose there exists k > such that 

K < y"") < for all v eV, x eX, y eY. 



e> £o 
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Then 

max|||F„7r^_;^ - F„7r^_J| < = — . 

Proof. Note that 

IF tt'' - F tt^ III - IIIC-^B^Ptt^ - C^B^S^Ptt'' III 

yriT^n-l '^n^n-lllli^ — lll^^n ° ° ^'^n-llll 

9^—2 card K 

< 2k-2 card X III p^^_^ _ S^Pvr^^JI < 



N 



where the first inequahty is Lemma 4.2 and the second inequality follows from the 
simple estimate — S^/.f||| < l/\fN that holds for any probabihty ^. □ 

For the eixor in steps s < n, the requisite one-step bound (Proposition 4.22) is 
more involved. Before we prove it, we must first introduce an elementary lemma 
about products of empirical measures that will be needed below. 

Lemma 4.20. For any probability measure /i, we have 

H' III — 



where p' = jj J2k=i • • • > -^N cire i.i.d. ~ ^. 

Proof. We assume throughout that N > d? without loss of generality (other- 
wise the bound is trivial). Let |/| < 1 be a measurable function. Then 



N 

^d' ^ 

ki,...,kd=l 



We begin by bounding 

TV N 
ki,...,ki=lk{,...,k'^=l 

where 

Fki,...,ka '■= fiXki,- ■ ■ ,Xkj) - E[/(Xfcj, . . . ,Xkj)]. 
Note that E[Fk,,...,k,Fk'^,...,k'J = when {h, . . . ,kd} n {k[, . . . ,k'^} = 0. Thus 

4 ^ AT 

Var[/i^^(/)] < ^ ^ ^ l{k„...,k,}n{k[,...,k',}^0, 
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where we use ... < 2. But for each choice of ki, kd, there are at least 
(iV — dy choices of k[, . . . ,k'^ such that {ki, . . . , k^} n {k'l, . . . , k'^} = 0, so 

We can therefore estimate 

< ||//^'^-E[/i^'^]||+ 



N 

It remains to estimate the first term. To this end, note that E[/(Xfcj , . . . , Xk,J] = 
whenever A;i / • • • / A;„. Therefore, we evidently have 

1 ^ 

iE[An/)]-/^n/)i<^ E \nfiXk,,---,x,j]-n^V)\ 

fci,...,fcd=i 



Af^ (A^ - - V V N J J - N ' 
But as > (i^, we have (f/N < d/\fN. The result follows. □ 
This result will be used in the following form. 
Corollary 4.21. For any subset of blocks L ^%,we have 



for every probability measure ^ on\ and s > 1. 

Proof. Write (1 := S^/U and d = card£, and let us enumerate the blocks 
L = {Ki, . . . , Kd}. Then for any bounded function / : R, we can write 

((8)A'e/:B^M)(/) = I f{x^\...,x^'')fi{dxi)---fi{dxd), 

((g)^,^B^S^M)(/) = J f{x^\...,x^'')fi{dxi)---f,{dxd). 

Thus evidently 

and the result follows from Lemma 4.20. □ 



LOCAL PARTICLE FILTERS 6 1 

We now proceed to prove a one-step error bound for time steps s < n. 

Proposition 4.22 (Sampling eiTor, s < n). Suppose there exist e,K > with 

e < < e-\ k < g"" {x\y") < k'^ ^ v e V, x, z e X, y e Y. 

Then 



N 



for every < s < n. 

Proof. We begin by bounding using Lemma 4.2 

||F,+iF,7r.^_i - F,+iM.^_il|/< = ||Cf+iB^PF,^,^_i - Cf+iB^PM.^-^ill 

< 2K"^l^l°°||B^PFs7r^_i - B^PF,^^_J. 

Now note that 

(B^PF,7r^_,)(dx^) _ 

(B^PF,vr^_,)(dx^) 
^^{dx^) 

/n.gA-p"(^,x")ni.^giv(K)n.^.i.^g^'(^''',>;^')(B^'s^pei)(^^^') 

mK'emK)U.'eK'9''\z''\Yr){B^'S^Pn^^,){dzn 
where if)^ {dx^) := YIvgk '^^{dx'"), and we can write 
||B^PF,7r,^i- B^PF.TT^JI = 

(B^PF,7r,^i)((ix^) (B^PF,7r^i)(^Zx^^ 



/ 



ijK{dx^^ 



i^^idx^^ 



We therefore have by Minkowski's integral inequality 

E[||B^^PF,7r^i-B^PF,7rr-if]'/' 



E 



(B^PF,7r,^_i)(dx^) (B^PF,7r,^_i)(dx^) 



i;^{dx^) 



1/2 



{dx 



<V (X^) sup E 



(B^PF,^,^_i)(dx^^) (B^PF,7r^i)(dx^) 



'tp^{dx 



^^{dx^^ 



1/2 



- II 
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As we have 
and 

^m^A^ < n n a^'iz-^Yf) < K-i^i-^^, 

K'£N{K) v'eK' 

we can apply Lemma 4.2 to estimate 

S K \\\v9K'eN{K)° v9ki&n(K)° ^ ^ 

By Corollary 4.21 (appUed conditionally given vr^_x)» we obtain 

~ o 1/9 8ATre-2|3CU -2|3C|ooA3c 

E[||B^PF.vr^_, - B^PF.vr^_,f ] < ^ . 

The result follows immediately. 
We finally put everything together. 

Theorem 4.23 (Variance term). Suppose there exist e, k > with 

e <p''{x,z'') <e-^, < g" {x^y") < K.-^ G x, z E X, y e Y 
such that 

( 1 \ V2A 

e > £0 = 1 



V 6A3cA2 
Letj3 = - log6A3cA2(l - e"^^) > 0. Then 

\\\K-T^n\\\j < card J ^= 

for every n>0, x£^, KgJC and J <^ K. 

Proof. We begin with the elementary error decomposition 



I'^n - •^nlll J < X] 111^" ■ ■ ■ Fs+lFsVTs-i - F„ • • • f g+l^ sT^s-l\\\ J- 
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The term s = n in this sum is bounded in Lemma 4. 19: 

9^— 2|JC|oo 

lllp _ p III <^ Z_ 

WnT^n-X ~ ■""■^n-llll J — "^j^ • 

The term s = n — 1 is bounded in Proposition 4.22: 

IIIC C ^2: c c III ^ -^"^X ^ 2^ 

III rnrn-l"s-l ~ '"nrn-l"s-l ||| j ^ T^y • 

V iV 

Now suppose s < n — 1. Then we can estimate using Corollary 4.18 

|||Fn • • • Fs+iFs7r^_;^ — • • • Fs+iFs7r^_]^||| J. 

< cardJe-^("-^-i) maxE[||F,+iF,7rf_i - F,+iF,7r,^„i||y 

Applying Proposition 4.22 yields 

|||Fn • • • Fs+lFs7rg_]^ — F„ • • • Fs+lFsTTg.^lll J 



< card J e-'^^"-^-^) 



Substituting the above three cases into the error decomposition and summing the 
geometric series yields the statement of the Theorem. □ 



Theorems 4.14 and 4.23 now immediately yield Theorem 2.1. 
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